Compare commits

...

461 Commits

Author SHA1 Message Date
David Wiesen
b6b5ce9e53 fix(windows): avoid WindowsApps shell aliases in sandbox 2026-05-01 10:15:46 -07:00
pakrym-oai
9b8d585075 [codex] Add Codex environment config (#20630)
## Why

This adds a checked-in Codex environment configuration so the repo
exposes a ready-to-run Codex action from the app environment metadata.

## What changed

- Added `.codex/environments/environment.toml` with a generated `Run`
action.
- The action runs the `codex` binary from `codex-rs/Cargo.toml` with
`mcp_oauth_credentials_store=file`.

## Verification

- Not run; configuration-only change.
2026-05-01 10:01:45 -07:00
Eric Traut
6784db51c0 Add /ide context support to the TUI (#20294)
## Why

Users have asked for a `/ide` command in the TUI so Codex can use the
active IDE session for live context such as the current file, open tabs,
and selected ranges. We already support a similar feature in the Codex
desktop app, so bringing it to the TUI makes sense.

One subtle compatibility constraint is that the injected prompt wrapper
and transcript stripping should match the desktop app and IDE extension.
By using the same `## My request for Codex:` delimiter and hiding the
injected context from transcript rendering the same way, threads created
in the TUI render correctly in desktop and IDE surfaces, and threads
created there replay correctly in the TUI, even when IDE context was
included.

Addresses https://github.com/openai/codex/issues/13834.

## What changed
### Summary
This PR consists of four four pieces:
1. An IPC client that uses a socket (Mac/Linux) or named pipe (Windows)
to talk to the IDE Extension
2. Logic that establishes the IPC connection and requests IDE context
(open files, selection) on demand
3. Logic that injects this context into the user prompt (using the same
technique as the desktop app) and hides the added context when rendering
the prompt in the TUI transcript
4. A new slash command for enabling/disabling this mode and text within
the footer to indicate when it's enabled

### Details
- Added `/ide [on|off|status]` to the TUI, with bare `/ide` toggling IDE
context on or off.
- Added a Rust IDE context client that connects to the local Codex IDE
IPC route as a client and requests context from the IDE extension flow.
- Injected IDE context using the same prompt delimiter and
transcript-stripping convention as the desktop app and IDE extension so
shared threads render consistently across surfaces.
- Added an `IDE context` status-line indicator while the feature is
active and cleared it when enabling or fetching context fails.
- Added handling for multiple selection ranges, oversized selections,
interleaved IPC messages, and transient reconnect timing after quick
toggles.

## Verification

Did extensive manual testing in addition to running automated unit and
regression tests.

To test:

- Launch VS Code (or Cursor) with the IDE extension.
- Open one or more files in the IDE and select a range of text within
one of them.
- Start the TUI.
- Ask the agent which files you have open in your IDE, and it should say
that it does not know.
- Enable `/ide` mode; note that `IDE context` appears in the lower
right.
- Ask the agent what files you have open in your IDE and what text is
selected.
2026-05-01 09:39:48 -07:00
Ruslan Nigmatullin
41e171fcf2 app-server: move transport into dedicated crate (#20545)
## Why

`codex-app-server` currently owns both request-processing code and
transport implementation details. Splitting the transport layer into its
own crate makes that boundary explicit, reduces the amount of
transport-specific dependency surface carried by `codex-app-server`, and
gives future transport work a narrower place to evolve.

## What changed

- Added `codex-app-server-transport` and moved the existing transport
tree into it, including stdio, unix socket, websocket, remote-control
transport, and websocket auth.
- Moved shared transport-facing message types into the new crate so both
the transport implementation and `codex-app-server` use the same
definitions.
- Kept processor-facing connection state and outbound routing in
`codex-app-server`, with the routing tests moved next to that local
wrapper.
- Updated workspace metadata, Bazel crate metadata, and
`codex-app-server` dependencies for the new crate boundary.

## Validation

- `cargo metadata --locked --no-deps`
- `git diff --check`
- Attempted `cargo test -p codex-app-server-transport`, `cargo test -p
codex-app-server`, `just fix -p codex-app-server-transport`, and `just
fix -p codex-app-server`; all were blocked before compilation by the
existing `packageproxy` resolution failure for locked `rustls-webpki =
0.103.13`.
- Attempted Bazel build / lockfile validation; those were blocked by
external fetch failures against BuildBuddy / GitHub while resolving
`v8`.
2026-05-01 09:23:47 -07:00
jif-oai
5744b85b9a fix: cargo deny (#20627)
Fix cargo deny by ack the `RUSTSEC` while a fix land
```
  RUSTSEC-2026-0118
  NSEC3 closest-encloser proof validation enters unbounded loop on cross-zone responses

  RUSTSEC-2026-0119
  CPU exhaustion during message encoding due to O(n²) name compression

  Dependency path:

  hickory-proto 0.25.2
  └── hickory-resolver 0.25.2
      └── rama-dns 0.3.0-alpha.4
          └── rama-tcp 0.3.0-alpha.4
              └── codex-network-proxy
```

Also upgrade some workers version to prevent this:
```
warning[license-not-encountered]: license was not encountered
    ┌─ ./codex-rs/deny.toml:131:6
    │
131 │     "OpenSSL",
    │      ━━━━━━━ unmatched license allowance

warning[duplicate]: found 2 duplicate entries for crate 'base64'
   ┌─ /github/workspace/codex-rs/Cargo.lock:79:1
   │
79 │ ╭ base64 0.21.7 registry+https://github.com/rust-lang/crates.io-index
80 │ │ base64 0.22.1 registry+https://github.com/rust-lang/crates.io-index
   │ ╰───────────────────────────────────────────────────────────────────┘ lock entries
```
2026-05-01 18:15:38 +02:00
Eric Traut
3d1d164aee Remove no-tool goal continuation suppression (#20523)
## Why

`/goal` is supposed to keep Codex working until the goal is actually
done. The previous continuation logic had two ways to stop early: the
continuation prompt told the model to wait for new input when it felt
blocked, and the runtime suppressed another continuation turn after a
continuation finished without any tool calls.

That made goals stop short even when the agent could still keep making
progress (I received a few reports of this from users). It also relied
on a brittle heuristic that treated "no registry tool calls" as
equivalent to "should stop."

## What changed

- removed the continuation prompt sentence that told the model to stop
and wait for new input when it could not continue productively
- removed the goal runtime suppression heuristic that stopped
auto-continuation after a no-tool continuation turn
- deleted the continuation-activity bookkeeping and left `tool_calls` as
telemetry only
- added focused regressions for the two intended behaviors: completed
no-tool continuation turns still continue, while `request_user_input`
keeps the existing turn open instead of spawning a new continuation
2026-05-01 09:09:55 -07:00
Eric Traut
227bee0445 Enforce animations = false for screen readers (#20564)
## Why

Issue #20489 calls out that animated TUI affordances can be noisy for
screen-reader users. Codex already has `tui.animations = false` as a
reduced-motion setting, but some live activity rows render spinner-style
prefixes in that mode. These were relatively recent regressions.

We have also regressed this pattern more than once by adding new
spinner/shimmer callsites that do not think through the reduced-motion
path, so this PR adds a small guardrail while fixing the current
surfaces.

## What changed

- Omit the live status-row spinner when animations are disabled, so the
row starts with stable text like `Working (...)`.
- Render running hook headers without the spinner prefix when animations
are disabled, while preserving shimmer/spinner behavior when animations
are enabled.
- Centralize TUI activity indicators in `tui/src/motion.rs`, with
explicit reduced-motion choices for hidden prefixes, static bullets, and
plain shimmer-text fallbacks.
- Route existing spinner/shimmer callsites through the central motion
helper, including exec rows, MCP/web-search/loading rows, hook rows,
plugin loading, and onboarding loading text.
- Add a source-scan regression test that rejects direct `spinner(...)`
or `shimmer_spans(...)` usage outside the central module and primitive
definition.
- Add focused coverage that reduced-motion active exec rows are stable,
status rows start without a spinner, running hooks omit the spinner, and
MCP inventory loading stays stable.
- Update the one affected status-indicator snapshot; the existing detail
tree prefix remains unchanged.

## Verification

- `cargo test -p codex-tui`
2026-05-01 09:07:56 -07:00
pakrym-oai
f476338f93 Move apply-patch file changes into turn items (#20540)
## Why

Apply-patch file changes are now part of the core turn item stream, so
v2 clients can consume the same first-class item lifecycle path used by
other turn items instead of relying on app-server-specific remapping
from legacy patch events.

## What changed

- Added a core `TurnItem::FileChange` carrying apply-patch changes and
completion metadata.
- Updated the apply-patch tool emitter to send `ItemStarted` /
`ItemCompleted` with the new `FileChange` item while preserving legacy
`PatchApplyBegin` / `PatchApplyEnd` fan-out.
- Updated app-server v2 conversion to render the new core item directly
and stopped `event_mapping` from remapping old patch begin/end events
into item notifications.
- Kept thread history reconstruction based on the existing old
apply-patch events for rollout compatibility.

## Verification

- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-core --test all
apply_patch_tool_executes_and_emits_patch_events`
- `cargo test -p codex-app-server bespoke_event_handling`
2026-05-01 08:47:18 -07:00
jif-oai
0b04d1b3cc feat: export and replay effective config locks (#20405)
## Why

For reproducibility. A hand-written `config.toml` is not enough to
recreate what a Codex session actually ran with because layered config,
CLI overrides, defaults, feature aliases, resolved feature config,
prompt setup, and model-catalog/session values can all affect the final
runtime behavior.

This PR adds an effective config lockfile path: one run can export the
resolved session config, and a later run can replay that lockfile and
fail early if the regenerated effective config drifts.

## What Changed

- Add a dedicated `ConfigLockfileToml` wrapper with top-level lockfile
metadata plus the replayable config:

  ```toml
  version = 1
  codex_version = "..."

  [config]
  # effective ConfigToml fields
  ```

- Keep lockfile metadata out of regular `ConfigToml`; replay loads
`ConfigLockfileToml` and then uses its nested `config` as the
authoritative config layer.
- Add `debug.config_lockfile.export_dir` to write
`<thread_id>.config.lock.toml` when a root session starts.
- Add `debug.config_lockfile.load_path` to replay a saved lockfile and
validate the regenerated session lockfile against it.
- Add `debug.config_lockfile.allow_codex_version_mismatch` to optionally
tolerate Codex binary version drift while still comparing the rest of
the lockfile.
- Add `debug.config_lockfile.save_fields_resolved_from_model_catalog` so
lock creation can either save model-catalog/session-resolved fields or
intentionally leave those fields dynamic.
- Build lockfiles from the effective config plus resolved runtime values
such as model selection, reasoning settings, prompts, service tier, web
search mode, feature states/config, memories config, skill instructions,
and agent limits.
- Materialize feature aliases and custom feature config into the
lockfile so replay compares canonical resolved behavior instead of
user-authored alias shape.
- Strip profile/debug/file-include/environment-specific inputs from
generated lockfiles so they contain replayable values rather than the
inputs that produced those values.
- Surface JSON-RPC server error code/data in app-server client and TUI
bootstrap errors so config-lock replay failures include the actual TOML
diff.
- Regenerate the config schema for the new debug config keys.

## Review Notes

The main flow is split across these files:

- `config/src/config_toml.rs`: lockfile/debug TOML shapes.
- `core/src/config/mod.rs`: loading `debug.config_lockfile.*`, replaying
a lockfile as a config layer, and preserving the expected lockfile for
validation.
- `core/src/session/config_lock.rs`: exporting the current session
lockfile and materializing resolved session/config values.
- `core/src/config_lock.rs`: lockfile parsing, metadata/version checks,
replay comparison, and diff formatting.

## Usage

Export a lockfile from a normal session:

```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"'
```

Export a lockfile without saving model-catalog/session-resolved fields:

```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' \
  -c 'debug.config_lockfile.save_fields_resolved_from_model_catalog=false'
```

Replay a saved lockfile in a later session:

```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"'
```

If replay resolves to a different effective config, startup fails with a
TOML diff.

To tolerate Codex binary version drift during replay:

```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' \
  -c 'debug.config_lockfile.allow_codex_version_mismatch=true'
```

## Limitations

This does not support custom rules/network policies.

## Verification

- `cargo test -p codex-core config_lock`
- `cargo test -p codex-config`
- `cargo test -p codex-thread-manager-sample`
2026-05-01 17:46:02 +02:00
jif-oai
ff27d01676 feat: seed ad-hoc memory extension instructions (#20606)
## Summary

Ad-hoc memory notes are written under `memories/extensions/ad_hoc/`, but
the consolidation agent only knows how to interpret an extension when
the extension folder has an `instructions.md`. Seed those instructions
from the memories write pipeline so an enabled memories startup creates
the expected ad-hoc extension layout automatically.

This also moves extension-specific write behavior behind a dedicated
`memories/write/src/extensions/` module. `ad_hoc` owns the seeded
instructions template, while the existing resource-retention cleanup
lives in its own `prune` module so future memory extensions can add
their own write-side setup without growing a flat helper file.

## Changes

- Seed `memories/extensions/ad_hoc/instructions.md` during eligible
memory startup without overwriting an existing file.
- Store the ad-hoc instructions template under
`memories/write/templates/extensions/ad_hoc/`, keeping ownership in
`codex-memories-write`.
- Split memory extension support into `extensions::ad_hoc` and
`extensions::prune`.
- Keep the existing old-resource pruning behavior unchanged.

## Verification

- `cargo test -p codex-memories-write`
- `bazel build //codex-rs/memories/write:write`

---------

Co-authored-by: chatgpt-codex-connector[bot] <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
2026-05-01 14:43:58 +02:00
jif-oai
70fc55b8f3 chore: improve remember prompt (#20610) 2026-05-01 14:38:07 +02:00
jif-oai
97aae46800 feat: ad-hoc instructions (#20602) 2026-05-01 13:42:54 +02:00
jif-oai
ad404c8400 chore: allow memories edition (#20600) 2026-05-01 13:27:37 +02:00
xl-openai
48791920a8 feat: Track local paths for shared plugins (#20560)
When a local plugin is shared, Codex now records the local plugin path
by remote plugin id under CODEX_HOME/.tmp.

plugin/share/list includes the remote share URL and the matching local
plugin path when available, and plugin/share/delete
clears the local mapping after deleting the remote share.

Also add sharedURL to plugin/share/list.
2026-05-01 00:50:12 -07:00
xli-oai
96d2ea9058 Add remote plugin skill read API (#20150)
## Summary

Adds an app-server `plugin/skill/read` method for remote plugin skill
markdown. The new method calls the plugin-service skill detail endpoint
and returns `skill_md_contents`, so clients can preview skills for
remote plugins before the bundle is installed locally.

## Why

Uninstalled remote plugin skills do not have local `SKILL.md` files.
Without an on-demand remote read, the desktop plugin details UI cannot
render the skill details modal for those skills.

## Validation

- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server --test all --
suite::v2::plugin_read::plugin_skill_read_reads_remote_skill_contents_when_remote_plugin_enabled
--exact`
- `just fix -p codex-app-server-protocol -p codex-core-plugins -p
codex-app-server`
2026-05-01 00:16:25 -07:00
xli-oai
a62b52f826 Refresh remote plugin cache on auth changes (#20265)
## Summary
- Refresh the remote installed-plugin cache after login/logout instead
of keying it by account or eagerly clearing it.
- Reuse the existing single-flight remote installed refresh loop so
newer queued auth refreshes replace older pending requests and the API
result eventually overwrites or clears the cache.
- Keep derived plugin/skills cache and MCP refresh side effects behind
the existing effective-plugin-changed task when the refreshed installed
state changes.
- Leave `clear_plugin_related_caches` scoped to derived plugin/skills
caches so share mutations do not drop remote installed plugins.

## Tests
- `cargo fmt --all --manifest-path codex-rs/Cargo.toml` (passes; stable
rustfmt warns that `imports_granularity = Item` is nightly-only)
- `cargo test -p codex-core-plugins remote_installed_cache`
- `cargo test -p codex-app-server
skills_list_loads_remote_installed_plugin_skills_from_cache`
2026-04-30 23:11:14 -07:00
Eric Traut
a93c89f497 Color TUI statusline from active theme (#19631)
## Why

Users have shared that the TUI can feel too visually flat because themes
mostly show up in code syntax highlighting. The configurable statusline
is a natural place to make the active theme more visible, while still
letting users keep the existing monotone statusline if they prefer it.

## What Changed

- Added a statusline styling helper that builds the rendered statusline
from `(StatusLineItem, text)` segments, preserving item identity while
keeping the plain text output unchanged.
- Derived foreground accent colors from the active syntax theme by
looking up TextMate scopes through the existing syntax highlighter, with
conservative ANSI fallbacks when a scope does not provide a foreground.
- Tuned theme-derived colors to keep the accents visible without making
the statusline feel overly bright.
- Added `[tui].status_line_use_colors`, defaulting to `true`, plus a
separated `/statusline` toggle so users can enable or disable
theme-derived statusline colors from the setup UI.
- Updated the live statusline and `/statusline` preview to use the same
styled builder, while keeping terminal-title preview text plain.
- Kept statusline separators and active-agent add-ons subdued while
removing blanket dimming from the whole passive statusline.

## Verification

- `cargo test -p codex-tui status_line`
- `cargo test -p codex-tui theme_picker`
- `cargo test -p codex-tui foreground_style_for_scopes`
- `cargo test -p codex-tui`
- `cargo test -p codex-config`
- `cargo test -p codex-core status_line_use_colors`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`

## Visual

<img width="369" height="23" alt="Screenshot 2026-04-30 at 6 16 08 PM"
src="https://github.com/user-attachments/assets/11d03efb-8e4f-4450-8f4d-00a9659ef4cd"
/>

<img width="385" height="23" alt="Screenshot 2026-04-30 at 6 16 02 PM"
src="https://github.com/user-attachments/assets/a3d89f36-bdc1-42e8-8e84-61350e3999e2"
/>
2026-04-30 22:42:48 -07:00
Eric Traut
d898cc8f3f Format multi-day goal durations in the TUI (#20558)
## Why

Goal mode shows elapsed time in compact hour/minute form. That is easy
to scan for shorter runs, but once a goal runs past 24 hours, large hour
counts become harder to read at a glance.

## What changed

Updated `codex-rs/tui/src/goal_display.rs` so unbudgeted goal elapsed
time keeps the existing compact format below one day, then switches to a
day-aware format once the elapsed time reaches 24 hours:

- `23h 59m`
- `1d 0h 0m`
- `2d 23h 42m`

The formatter now covers the 24-hour boundary in unit tests, and the TUI
status-line snapshot for a completed elapsed goal now exercises the
multi-day display.

## Verification

- `cargo test -p codex-tui`

Here's my longest-running test task:

<img width="186" height="23" alt="image"
src="https://github.com/user-attachments/assets/cedfcdab-7f6e-44e6-8495-8a39f63973fb"
/>
2026-04-30 22:42:07 -07:00
Tom
fe05acad23 Make thread store process-scoped (#19474)
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
2026-04-30 21:24:59 -07:00
pakrym-oai
f50c02d7bc [codex] Remove unused event messages (#20511)
## Why

Several legacy `EventMsg` variants were still emitted or mapped even
though clients either ignored them or had moved to item/lifecycle
events. `Op::Undo` had also degraded to an unavailable shim, so this
removes that dead task path instead of preserving a command that cannot
do useful work.

`McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
intentionally kept because useful consumers still depend on them: MCP
startup completion drives readiness behavior, and the begin events let
app-server/core consumers surface in-progress web-search and
image-generation items before the final payload arrives.

## What Changed

- Removed weak legacy event variants and payloads from `codex-protocol`,
including legacy agent deltas, background events, and undo lifecycle
events.
- Kept/restored `EventMsg::McpStartupComplete`,
`EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
serializer and emission coverage.
- Updated core, rollout, MCP server, app-server thread history,
review/delegate filtering, and tests to rely on the useful replacement
events that remain.
- Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
slash-command comments.
- Stopped agent job/background progress and compaction retry notices
from emitting `BackgroundEvent` payloads.

## Verification

- `cargo check -p codex-protocol -p codex-app-server-protocol -p
codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-core --test all suite::items`
- `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
-p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
startup TUI tests.
2026-04-30 20:03:26 -07:00
xli-oai
bb60b78c46 Surface admin-disabled remote plugin status (#20298)
## Summary

Remote plugin-service returns plugin availability separately from a
user's installed/enabled state. This adds `PluginAvailabilityStatus` to
the app-server protocol, propagates remote catalog `status` into
`PluginSummary`, and rejects install attempts for remote plugins marked
`DISABLED_BY_ADMIN` before downloading or caching the bundle.

This is the `openai/codex` half of the change. The companion
`openai/openai` webview PR is
https://github.com/openai/openai/pull/873269.

## Validation

- `cargo run -p codex-app-server-protocol --bin write_schema_fixtures`
- `cargo test -p codex-app-server --test all
plugin_list_marks_remote_plugin_disabled_by_admin`
- `cargo test -p codex-app-server --test all
plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled`
- `cargo test -p codex-app-server --test all
plugin_install_rejects_remote_plugin_disabled_by_admin_before_download`
- `cargo test -p codex-app-server-protocol schema_fixtures`
2026-04-30 20:00:07 -07:00
Tom
c39824c2fd [codex] Improve PR babysitter CI diagnostics and guardrails (#20484)
## Summary

- Surface failed GitHub Actions jobs in the PR babysitter watcher so
Codex can fetch job logs as soon as a job fails, instead of waiting for
the overall workflow run to complete.
- Update babysit-pr skill instructions, GitHub API notes, and heuristics
to prefer direct job log archives before falling back to `gh run view
--log-failed`.
- Add guardrails requiring explicit user confirmation before posting
replies to human-authored review comments.
- Add guardrails preventing Codex from patching unrelated flaky tests,
CI infrastructure, runner issues, dependency outages, or other failures
not caused by the PR branch.

## Validation

- `python3 -m pytest
.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`
2026-04-30 19:58:19 -07:00
rhan-oai
6b1b227804 [codex-analytics] centralize thread analytics state (#20300)
## Why

Several analytics event families need the same per-thread attribution
state: the app-server client/runtime associated with a thread and, for
lifecycle-oriented events, the thread metadata captured during
initialization. Keeping connection ids and lifecycle metadata in
separate maps made each consumer rebuild the same thread context and
made subagent attribution harder to resolve consistently.

## What changed

- Replaces the separate thread connection and metadata maps with one
reducer-owned `threads` map.
- Routes guardian, compaction, turn-steer, and turn analytics through
shared thread-state lookups while preserving turn-origin attribution for
turn events and request-origin attribution for steer events.
- Lets newly observed spawned subagent threads inherit their parent
thread connection so later thread-scoped analytics can resolve through
the same state model.
- Adds regression coverage for standalone `SubAgentThreadStarted`
publication plus the `SubAgentSource::ThreadSpawn` parent fallback
through a thread-scoped consumer that depends on inherited connection
state.

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20300).
* #18748
* #18747
* #17090
* #17089
* #20239
* #20515
* #20514
* __->__ #20300
2026-04-30 18:58:50 -07:00
Ruslan Nigmatullin
972b819213 app-server: switch remote control to protocol v3 segmentation (#20341)
## Why

Remote-control protocol v3 makes segmentation an explicit wire-level
feature. The app-server transport needs to support that protocol
directly so large messages can be chunked, acknowledged, replayed, and
reassembled consistently.

## What changed

- Bump the remote-control websocket protocol version from `2` to `3`.
- Add explicit client/server chunk envelope variants plus chunk-aware
acknowledgements.
- Split oversized outbound server messages into bounded transport
chunks.
- Reassemble ordered inbound client chunks with bounded memory usage and
stream/client invalidation handling.
- Track inbound chunk cursors and outbound ack cursors as `(seq_id,
segment_id)` so duplicate chunks and partial replays behave correctly.
- Add focused coverage for chunk splitting, reassembly, duplicate
suppression, and stream replacement behavior.

## Validation

- Added targeted unit coverage for segmented message handling in
`remote_control`.
- Local validation is currently blocked before compilation because
`packageproxy` does not serve the locked `rustls-webpki 0.103.13`
dependency required by the workspace.
2026-04-30 18:27:16 -07:00
Dylan Hurd
af089fb21d fix(exec_policy) heredoc parsing file_redirect (#20113)
## Summary
Fixes a regression introduced in #10941 so that heredocs do not permit
file redirects to be approved by rules, and adds scenario tests to cover
this behavior.


Previously, heredoc command parsing would allow redirects and
environment variables:
```bash
# commands_for_exec_policy() would parse this via parse_shell_lc_single_command_prefix
PATH=/tmp/bad:$PATH cat <<'EOF' > /tmp/bad/hello.txt
hello
EOF
```
This conflicts with the Codex Rules documentation; heredoc parsing logic
should abide by the same strictness of parsing.


## Tests
- [x] Updated unit tests accordingly
- [x] Added scenario tests for these cases

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-01 01:05:02 +00:00
iceweasel-oai
4f96001fa7 execpolicy: unwrap PowerShell -Command wrappers on Windows (#20336)
## Why
On Windows, Codex runs shell commands through a top-level
`powershell.exe -NoProfile -Command ...` wrapper. `execpolicy` was
matching that wrapper instead of the inner command, so prefix rules like
`["git", "push"]` did not fire for PowerShell-wrapped commands even
though the same normalization already happens for `bash -lc` on Unix.

This change makes the Windows shell wrapper transparent to rule matching
while preserving the existing Windows unmatched-command safelist and
dangerous-command heuristics.

## What changed
- add `parse_powershell_command_plain_commands()` in
`shell-command/src/powershell.rs` to unwrap the top-level PowerShell
`-Command` body with `extract_powershell_command()` and parse it with
the existing PowerShell AST parser
- update `core/src/exec_policy.rs` so `commands_for_exec_policy()`
treats top-level PowerShell wrappers like `bash -lc` and evaluates rules
against the parsed inner commands
- carry a small `ExecPolicyCommandOrigin` through unmatched-command
evaluation and expose `is_safe_powershell_words()` /
`is_dangerous_powershell_words()` so Windows safelist and
dangerous-command checks still work after unwrap
- add Windows-focused tests for wrapped PowerShell prompt/allow matches,
wrapper parsing, and unmatched safe/dangerous inner commands, and
re-enable the end-to-end `execpolicy_blocks_shell_invocation` test on
Windows

## Testing
- `cargo test -p codex-shell-command`
2026-05-01 00:56:20 +00:00
Abhinav
0d9a5d20ec Alias codex_hooks feature as hooks (#20522)
# Why

The hooks feature flag should use the concise canonical name `hooks`,
while existing configs that still use `codex_hooks` continue to work
during the rename.

# What

- change the canonical `Feature::CodexHooks` key from `codex_hooks` to
`hooks`
- register `codex_hooks` through the existing legacy-alias path
- update the config schema and canonical config fixtures to prefer
`hooks`
- add regression coverage that both `hooks` and `codex_hooks` resolve to
`Feature::CodexHooks`

# Verification

- `cargo test -p codex-features`
- `cargo test -p codex-core config::schema_tests`
- `cargo test -p codex-core
pre_tool_use_blocks_shell_when_defined_in_config_toml`
- `cargo test -p codex-app-server
hooks_list_uses_each_cwds_effective_feature_enablement`
2026-05-01 00:46:33 +00:00
Owen Lin
5affb7f9d5 fix(app-server): mark thread/turns/list and exclude_turns as experime… (#20499)
…ntal

We have some bugs to work out and it is not quite ready to consume as a
public API.
2026-04-30 17:39:08 -07:00
xli-oai
acdf908268 Emit analytics for remote plugin installs (#20267)
## Summary

- emit `codex_plugin_installed` after a remote plugin install succeeds
- keep local installs unchanged, but let remote installs override the
analytics `plugin_id` with the backend remote plugin id
(`plugins~Plugin_...`)
- preserve the local/display identity in `plugin_name` and
`marketplace_name`, plus capability metadata from the installed bundle
- add regression coverage for local install analytics, remote install
analytics, and analytics id override serialization

## Testing

- `just fmt`
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server`
2026-04-30 17:27:16 -07:00
Felipe Coury
b6f81257f8 feat(tui): add vim composer mode (#18595)
## Why

Codex now has configurable TUI keymaps, but the composer still behaves
like a plain text field. Users who prefer modal editing need a way to
keep Vim muscle memory while drafting prompts, and the keymap picker
needs to expose Vim-specific actions if those bindings are configurable
instead of hardcoded.

## What Changed

- Adds composer Vim mode with insert/normal state, common normal-mode
movement and editing commands, `d`/`y` operator-pending flows, and
mode-aware footer and cursor indicators.
- Adds `/vim`, an optional global `toggle_vim_mode` binding, and
`tui.vim_mode_default` so Vim mode can be toggled per session or enabled
as the default composer state.
- Extends runtime and config keymaps with `vim_normal` and
`vim_operator` contexts, exposes those contexts in `/keymap`, refreshes
the config schema, and validates Vim bindings separately.
- Integrates Vim normal mode with existing composer behavior: `/` opens
slash command entry, `!` enters shell mode, `j`/`k` navigate history at
history boundaries, successful submissions reset back to normal mode,
and paste burst handling remains insert-mode only.
- Teaches the TUI render path to apply and restore cursor style so Vim
insert mode can use a bar cursor without leaving the terminal in that
state after exit.

## Validation

- `cargo test -p codex-tui keymap -- --nocapture` on the keymap/Vim
coverage
- `cargo insta pending-snapshots`

## Docs

This introduces user-facing `/vim`, `tui.vim_mode_default`, and Vim
keymap contexts under `tui.keymap`, so the public CLI configuration and
slash-command docs should be updated before the feature ships.
2026-04-30 17:20:51 -07:00
maja-openai
a5ebedef67 Bypass review for always-allow MCP tools in auto-review (#20069)
## Why

When an MCP or app tool is configured with approval mode `approve`
(always allow), users expect that decision to be authoritative. In
guardian auto-review mode, ARC could still return `ask-user`, which then
routed the approval question into guardian with the ARC reason as
context. That meant a tool explicitly configured as always allowed still
went through both safety monitors before running.

This change keeps the existing ARC behavior for non-auto-review
sessions, but avoids the ARC-to-guardian sequence when
`approvals_reviewer = auto_review` and the tool approval mode is
`approve`.

## What changed

- Short-circuit MCP tool approval handling when `approval_mode ==
approve` and `approvals_reviewer == auto_review`.
- Updated the MCP approval regression test so the auto-review case
asserts neither ARC nor guardian is called.
- Preserved existing tests that verify ARC can still block always-allow
MCP tools outside guardian auto-review mode.

## Verification

- `cargo test -p codex-core --lib mcp_tool_call`
2026-04-30 16:44:09 -07:00
Owen Lin
5de7992ee5 fix(tui): set persist_extended_history: false (#20502)
Large rollouts are no good. This updates the TUI to behave the same as
the Codex App, which is also turning it off.
2026-04-30 23:31:31 +00:00
xli-oai
2686873e77 Sync remote installed plugin bundles (#20268)
## Summary
- Download missing remote installed plugin bundles during app-server
startup and plugin/list refresh.
- Upgrade cached remote installed bundles when the backend installed
version changes.
- Remove stale remote installed bundle caches without writing remote
plugin state into config.toml.

## Review note
This is a clean PR branch cut from the current diff on top of latest
`origin/main`. The diff intentionally has no `codex-rs/core/**` files,
so CODEOWNERS should not request the core-directory owner review from
stale PR history.

## Validation
Already run on the source branch before creating this clean PR:
- `just fmt`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-app-server --test all
app_server_startup_sync_downloads_remote_installed_plugin_bundles --
--nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_sync_upgrades_and_removes_remote_installed_plugin_bundles --
--nocapture`
- `cargo test -p codex-app-server --test all
app_server_startup_remote_plugin_sync_runs_once -- --nocapture`
- `just fix -p codex-core-plugins`
- `just fix -p codex-app-server`
- `git diff --check`
2026-04-30 16:05:14 -07:00
Owen Lin
9ddb267e9c fix: ignore dangerous project-level config keys (#20098)
## Description
Ignore these top-level config keys when loading project-scoped
config.toml files:
```
    "openai_base_url",
    "chatgpt_base_url",
    "model_provider",
    "model_providers",
    "profile",
    "profiles",
    "experimental_realtime_ws_base_url",
```

## What changed

- Add a project-local config denylist for credential-routing fields such
as `openai_base_url`, `chatgpt_base_url`, `model_provider`,
`model_providers`, `profile`, `profiles`, and
`experimental_realtime_ws_base_url`.
- Strip those fields from project config layers before they participate
in effective config merging, while leaving safe project-local settings
intact.
- Track ignored project-local keys on config layers and surface a
startup warning telling users to move those settings to user-level
`config.toml` if they intentionally need them.
- Update profile behavior coverage so project-local `profile` /
`profiles` entries are ignored instead of overriding user-level profile
selection.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core
project_layer_ignores_unsupported_config_keys`
- `cargo test -p codex-core project_profiles_are_ignored`
- `cargo test -p codex-core config::config_loader_tests`
2026-04-30 23:03:01 +00:00
Owen Lin
6014b6679f fix flaky test falls_back_to_registered_fallback_port_when_default_po… (#20504)
…rt_is_in_use
2026-04-30 22:06:04 +00:00
Akshay Nathan
8426edf71e Stateful streaming apply_patch parser 2026-04-30 21:41:15 +00:00
xl-openai
7b3de63041 Move plugin out of core. (#20348) 2026-04-30 14:26:14 -07:00
Tom
127be0612c [codex] Migrate thread turns list to thread store (#19280)
- migrate `thread/turns/list` to ThreadStore. Uses ThreadStore for most
data now but merges in the in-memory state from thread manager
- keep v2 `thread/list` pathless-store friendly by converting
`StoredThread` directly to API `Thread`
- add regression coverage for pathless store history/listing
2026-04-30 14:16:42 -07:00
alexsong-oai
9121132c8f Send external import completion for sync imports (#20379) 2026-04-30 13:03:21 -07:00
Matthew Zeng
70090c9ff7 [plugin] Add Canva to suggesteable list. (#20474)
- [x] Add Canva to suggesteable list.
2026-04-30 12:39:52 -07:00
iceweasel-oai
8121710ffe install WFP filters for Windows sandbox setup (#20101)
## Summary

This PR installs a first wave of WFP (Windows Filtering Platform)
filters that reduce the surface area of network egress vulnerabilities
for the Windows Sandbox.

- Add persistent Windows Filtering Platform provider, sublayer, and
filters for the Windows sandbox offline account.
- Install WFP filters during elevated full setup, log failures
non-fatally, and emit setup metrics when analytics are enabled.
- Bump the Windows sandbox setup version so existing users rerun full
setup and receive the new filters.

## What WFP is
Windows Filtering Platform (WFP) is the low-level Windows networking
policy engine underneath things like Windows Firewall. It lets
privileged code install persistent filtering rules at specific network
stack layers, with conditions like "only traffic from this Windows
account" or "only this remote port," and an action like block.

In this change, we create a Codex-owned persistent WFP provider and
sublayer, then install block filters scoped to the Windows sandbox's
offline user account via `ALE_USER_ID`. That means the filters are
targeted at sandboxed processes running as that account, rather than
globally affecting the host.

## Initial filter set
We are starting with 12 concrete WFP filters across a few high-value
bypass surfaces. The table below describes the filter families rather
than one filter per row:

| Area | Concrete filters | Purpose |
| --- | --- | --- |
| ICMP | 4 filters: ICMP v4/v6 on `ALE_AUTH_CONNECT` and
`ALE_RESOURCE_ASSIGNMENT` | Block direct ping-style network reachability
checks from the offline account. |
| DNS | 2 filters: remote port `53` on `ALE_AUTH_CONNECT_V4/V6` | Block
direct DNS queries that bypass our intended proxy/offline path. |
| DNS-over-TLS | 2 filters: remote port `853` on
`ALE_AUTH_CONNECT_V4/V6` | Block encrypted DNS attempts that could
bypass ordinary DNS interception. |
| SMB / NetBIOS | 4 filters: remote ports `445` and `139` on
`ALE_AUTH_CONNECT_V4/V6` | Block Windows file-sharing/network share
traffic from sandboxed processes. |

For IPv4/IPv6 coverage, the port-based filters are installed on both
`ALE_AUTH_CONNECT_V4` and `ALE_AUTH_CONNECT_V6`. ICMP also gets both
connect-layer and resource-assignment-layer coverage because ICMP
traffic is shaped differently from ordinary TCP/UDP port traffic.

## Validation
- `cargo fmt -p codex-windows-sandbox` (completed with existing
stable-rustfmt warnings about `imports_granularity = Item`)
- `cargo test -p codex-windows-sandbox wfp::tests`
- `cargo test -p codex-windows-sandbox` (fails in existing legacy
PowerShell sandbox tests because `Microsoft.PowerShell.Utility` could
not be loaded; WFP tests passed before that failure)
2026-04-30 12:39:01 -07:00
Owen Lin
7dd08e304c feat(rollouts): store EventMsg::ApplyPatchEnd in limited history mode (#20463)
The Codex App treats apply patch tool calls quite load-bearing in the UI
(always shown on a completed turn), so we'd like to persist
`EventMsg::ApplyPatchEnd` to guarantee that when a client reconnects to
app-server mid-turn, we always have the full diff to display at the end
of that turn.
2026-04-30 12:11:02 -07:00
iceweasel-oai
06f3b4836a [codex] Fix elevated Windows sandbox named-pipe access (#20270)
## Summary
- add elevated-only token constructors that include the current token
user SID in the restricted SID list
- switch the elevated Windows command runner to use those constructors
- leave the unelevated restricted-token path unchanged

## Why
Windows named pipes created by tools like Ninja use the platform's
default named-pipe ACL when no explicit security descriptor is provided.
In the elevated sandbox, the pipe owner has access, but the
write-restricted token can still fail its restricted-SID access check
because the sandbox user SID was not in the restricting SID set. That
causes child processes to exit successfully while Ninja never receives
the expected pipe completion/close behavior and hangs.

Including the elevated sandbox user's SID in the restricting SID list
lets the restricted check succeed for these owner-scoped pipe objects
without broadening the unelevated sandbox to the real signed-in user.

## Impact
- fixes the minimal Ninja hang repro in the elevated Windows sandbox
- preserves the existing unelevated sandbox behavior and write
protections
- keeps the change scoped to the elevated runner rather than changing
shared token semantics
- this does not affect file-writes for the sandbox because the sandbox
users themselves do not receive any additional permissions over what the
capability SIDs already have. In fact we don't even explicitly grant the
sandbox user ACLs anywhere.

## Validation
- `cargo build -p codex-windows-sandbox --quiet`
- verified the stock `ninja.exe` minimal repro exits normally on host
and in the elevated sandbox
- verified the same repro still hangs in the unelevated sandbox, which
is the intended scope of this change
2026-04-30 12:06:11 -07:00
Celia Chen
31f8813e3e fix: show correct Bedrock runtime endpoint in /status (#20275)
## Why

`/status` was showing the configured `ModelProviderInfo.base_url` for
Amazon Bedrock, which can be stale or misleading because the actual
Bedrock Mantle endpoint is derived at runtime from the resolved AWS
region. This made sessions report the wrong provider endpoint even
though requests used the correct runtime URL.

## What changed

- Added `ModelProvider::runtime_base_url()` so provider implementations
can expose the request-time base URL through the shared runtime provider
abstraction.
- Moved Bedrock region-to-Mantle URL resolution into
`amazon_bedrock::mantle::runtime_base_url()`, keeping region resolution
private to the Mantle module.
- Overrode `runtime_base_url()` for Amazon Bedrock so it returns the
resolved Mantle endpoint instead of the configured default.
- Resolved and cached the runtime provider base URL during TUI startup,
then used that cached value when rendering `/status`.
- Added status coverage that verifies Bedrock displays the runtime URL
and ignores the configured Bedrock `base_url` when they differ.

## Verification
model provider is resolved correctly in local build:
<img width="696" height="245" alt="Screenshot 2026-04-29 at 5 01 36 PM"
src="https://github.com/user-attachments/assets/a13c10a5-3720-41ab-8ace-3c4bc573f971"
/>
2026-04-30 19:02:34 +00:00
Abhinav
93d53f655b Add /hooks browser for lifecycle hooks (#19882)
## Why

`hooks/list` and `hooks/config/write` give us read/write access to hooks
and their state. This hooks up the TUI as a client so users can inspect
and manage that state directly.

## What

- add a two-page `/hooks` browser in the TUI: an event overview with
installed/active counts, followed by a per-event handler page with
toggle controls and detail rendering
- thread managed-state metadata through hook discovery and `hooks/list`
so the UI can label admin-managed hooks and suppress toggles for them
- persist hook toggles through the existing config-write path and add
snapshot coverage for the event list, handler list, managed-hook, and
empty states

## Stack

1. openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. This PR - openai/codex#19882

## Reviewer Notes

- Main UI logic is in
`codex-rs/tui/src/bottom_pane/hooks_browser_view.rs`; most of the diff
is the new view plus its snapshot coverage
- Request / write plumbing for opening the browser and persisting
toggles is in `codex-rs/tui/src/app/background_requests.rs` and
`codex-rs/tui/src/chatwidget/hooks.rs`
- Outside the TUI, the only behavioral change in this PR is threading
`is_managed` through hook discovery and `hooks/list` so managed hooks
render as non-toggleable
- The `codex-rs/tui/src/status/snapshots/` churn is unrelated merge
fallout from the stacked base branch's newer permission-label rendering

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 11:58:27 -07:00
khoi
719431da6e [Codex] Add browser use external feature flag (#20245)
## Summary

- Adds a separate feature control for external-browser Browser Use
integrations.
- Registers `browser_use_external` as a stable, default-enabled
requirements-owned feature key.
- Updates feature registry tests and regenerates the config schema.

Codex validation:
- `cargo fmt -- --config imports_granularity=Item`
- `cargo run -p codex-core --bin codex-write-config-schema`
- `cargo test -p codex-features`

## Addendum

This gives enterprise policy a coarse control for Browser Use outside
the Codex-managed in-app browser. The existing `browser_use` feature is
the Browser Use control, while `browser_use_external` can gate
extension/native integrations for external browsers as that surface
grows
2026-04-30 11:53:19 -07:00
pakrym-oai
b52083146c Stop emitting item/fileChange/outputDelta output delta notifications (#20471)
## Why

`item/fileChange/outputDelta` text output was only the tool's summary or
error text and not used by client surfaces.

We keep `item/fileChange/outputDelta` in the app-server protocol as a
deprecated compatibility entry, but the server no longer emits it.

## What changed

- stop the `apply_patch` runtime from emitting `ExecCommandOutputDelta`
events
- simplify `item_event_to_server_notification` so command output deltas
always map to `item/commandExecution/outputDelta`
- remove the app-server bookkeeping that tried to detect whether an
output delta belonged to a file change
- mark `item/fileChange/outputDelta` as a deprecated legacy protocol
entry in the v2 types, schema, and README
- simplify the file-change approval tests so they only wait for
completion instead of expecting output-delta notifications

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-thread-manager-sample`
- `cargo test -p codex-app-server-protocol
protocol::event_mapping::tests::exec_command_output_delta_maps_to_command_execution_output_delta
-- --exact`
- `cargo test -p codex-app-server
turn_start_file_change_approval_accept_for_session_persists_v2 --
--exact` *(failed before the test assertions because the wiremock
`/responses` mock received 0 requests in setup)*
2026-04-30 11:42:07 -07:00
Eric Traut
f2bc2f26a9 Remove core protocol dependency [2/2] (#20325)
## Why

With the local model layer and app-server routing in place from PR1,
this PR moves the active TUI runtime onto app-server notifications. The
affected pieces share the same event flow, so the command surface,
session state, bottom-pane prompts, chat rendering, history/status
views, and tests move together to keep the stacked branch buildable.

This PR also removes the obsolete compatibility surface that is no
longer used after the migration. The proposed protocol-boundary verifier
layer was dropped from the stack; enforcing that final boundary will be
simpler once `codex-tui` no longer needs any `codex_protocol`
references.

This PR is part 2 of a 2-PR stack:

1. Add TUI-owned replacement models and extract app-server event
routing.
2. Move the active TUI flow to app-server notifications and delete
obsolete adapter code.

## What changed

- Rewired app command and session handling to use app-server request and
notification shapes.
- Moved approval overlays, request-user-input flows, MCP elicitation,
realtime events, and review commands onto the app-server-facing model
surface.
- Updated chat rendering, history cells, status views, multi-agent UI,
replay state, and TUI tests to use app-server notifications plus the
local models introduced in PR1.
- Deleted `codex-rs/tui/src/app/app_server_adapter.rs` and the
superseded `chatwidget/tests/background_events.rs` fixture path.

## Verification

- `cargo check -p codex-tui --tests`
- Top of stack: `cargo test -p codex-tui`
2026-04-30 11:34:34 -07:00
pakrym-oai
5cc5f12efc Move item event mapping into app-server-protocol (#20299)
## Why

Follow-up to #20291.

The v2 item-event-to-notification translation had been embedded in
`app-server/src/bespoke_event_handling.rs`, which made it hard to reuse
anywhere else. This PR moves that stateless mapping into shared protocol
code so other entry points can produce the same `ServerNotification`
payloads without copying app-server logic.

That also lets `thread-manager-sample` demonstrate the same notification
surface that the app server exposes, instead of only printing the final
assistant message.

## What changed

- move `item_event_to_server_notification` into
`codex-app-server-protocol::protocol::event_mapping`
- keep the mapper tests next to the shared implementation in
`codex-app-server-protocol`
- re-export the mapper from `codex-core-api` so lightweight consumers
can use it without reaching into `app-server-protocol` directly
- simplify `app-server/src/bespoke_event_handling.rs` so it delegates
the stateless event-to-notification projection to the shared helper
- update `thread-manager-sample` to:
  - print mapped notifications as newline-delimited JSON
  - use the shared mapper through `codex-core-api`
- enable the default feature set so the sample exposes the normal tool
surface
- use a `read_only` permission profile so shell commands can run in the
sample without widening permissions

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core-api`
- `cargo test -p codex-app-server bespoke_event_handling::tests`
- `cargo test -p codex-thread-manager-sample`
- `cargo run -p codex-thread-manager-sample -- "briefly explore the repo
with pwd and ls, then summarize it"`
2026-04-30 11:02:13 -07:00
Eric Traut
c70cdc108f Remove core protocol dependency [1/2] (#20324)
## Why

This stack moves `codex-tui` away from the core protocol event surface
and toward app-server API shapes plus TUI-owned local models. This first
PR sets up the lower-risk foundation: it introduces the local model
surface and extracts app-server event routing into focused TUI modules
while preserving the existing behavior for the larger migration in PR2.

This PR is part 1 of a 2-PR stack:

1. Add TUI-owned replacement models and extract app-server event
routing.
2. Move the active TUI flow to app-server notifications and delete
obsolete adapter code.

## What changed

- Added TUI-owned approval, diff, session state, session resume, token
usage, and user-message models.
- Added `app/app_server_event_targets.rs` and `app/app_server_events.rs`
to hold app-server event targeting and dispatch logic outside `app.rs`.
- Updated app/status tests to use the local model layer and added
focused routing coverage.
- Boxed a few large async TUI test futures so this base layer remains
checkable without overflowing the default test stack.

## Verification

- `cargo check -p codex-tui --tests`
2026-04-30 10:52:19 -07:00
teddywyly-oai
487716ae74 [Extension] Allowlist Chrome Extension in the tool_suggest tool (#20458)
### Summary
Allowlist chrome extension in tool_suggest tool

### Screenshot
Allowlist chrome extension in tool_suggest tool
<img width="808" height="309" alt="chrome_internal"
src="https://github.com/user-attachments/assets/ed769d77-b635-4a40-a0c5-fbff05af3036"
/>
2026-04-30 10:29:03 -07:00
canvrno-oai
a85d265097 /plugins: remove marketplace (#19843)
This PR adds marketplace removal to the /plugins menu, giving users a
way to remove user-configured plugin marketplaces. It adds a `Ctrl+R`
shortcut to remove selected marketplace tabs, a confirmation prompt,
loading and error states, and the app-server request flow needed to
perform marketplace/remove. After a successful removal, the TUI
refreshes config, plugin mentions, user config, and plugin data so the
removed marketplace disappears from the menu and other surfaces in the
TUI.

- Add `Ctrl+R` removal option for user-configured marketplace tabs
- Show marketplace removal confirmation, loading, and error states
- Route `marketplace/remove` through the TUI background request flow
- Refresh config, plugin mentions, and plugin data after successful
removal
- Adds reusable per-tab footer hints so removal guidance only appears on
applicable tabs
- Add test coverage for `Ctrl+R` behavior while plugin search is active

Steps to test:
- Add a marketplace using the TUI /plugins menu
- Use Ctrl+R to remove the marketplace
- Accept the confirmation prompt
- Confirm the marketplace is removed when the process completes.
2026-04-30 10:25:07 -07:00
Eric Traut
c02814c106 Mark goals feature as experimental (#20083)
## Why

The `goals` feature flag is ready to move out of the hidden
under-development bucket and into the user-facing experimental surface.
Marking it experimental lets users discover it through the experimental
features UI while still making clear that it is opt-in.

## What changed

- Changed `goals` from `Stage::UnderDevelopment` to
`Stage::Experimental` in `codex-rs/features/src/lib.rs`.
- Added experimental menu metadata for the feature with the description
`Set a persistent goal Codex can continue over time`.

## Verification

- `cargo test -p codex-features`
2026-04-30 10:06:44 -07:00
Owen Lin
3516cb9751 fix(core): truncate large mcp tool outputs in rollouts (#20260)
## Why
Large MCP tool call outputs can make rollout JSONL files enormous. In
the session that motivated this change, the biggest JSONL records were:
- `event_msg/mcp_tool_call_end`
- `response_item/function_call_output`

both containing the same unbounded MCP payloads - just 3 MCP tool calls
that each were multi-hundred MBs 😱

This PR truncates both of those JSONL records.

## How

#### For `response_item/function_call_output`
Unified exec already bounds tool output before it is injected into
model-facing history, which also keeps the corresponding rollout
`response_item/function_call_output` records small.

MCP should follow the same pattern: truncate the model-facing tool
output at the tool-output boundary, while leaving code-mode/raw hook
consumers alone.

#### For `event_msg/mcp_tool_call_end`
`McpToolCallEnd` also needs its own bounded event copy because it is the
app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`.
Unfortunately this is _not_ downstream of the `ToolOutput` trait.

## Model behavior 
Model behavior is actually unchanged as a result of this PR. 

Before this PR, MCP output was:
1. Converted to `FunctionCallOutput`.
2. Recorded into in-memory history.
3. Truncated by `ContextManager::record_items()` before later model
turns saw it.

After this branch, MCP output is truncated earlier, in
`McpToolOutput::response_payload()`, using the same helper. Then
`ContextManager::record_items()` sees an already-truncated output and
effectively has little/no additional work to do.

So the model should still see the same kind of truncated function-call
output. The practical difference is where truncation happens: earlier,
before rollout persistence/app-server emission can see the giant
payload.

## Verification

- `cargo test -p codex-core mcp_tool_output`
- `cargo test -p codex-core
mcp_tool_call::tests::truncate_mcp_tool_result_for_event`
- `cargo test -p codex-core
mcp_post_tool_use_payload_uses_model_tool_name_args_and_result`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`
2026-04-30 16:30:43 +00:00
Ahmed Ibrahim
8a97f3cf03 realtime: rename provider session ids (#20361)
## Summary

Codex is repurposing `session` to mean a thread group, so the realtime
provider session id should no longer use `session_id` / `sessionId` in
Codex-facing protocol payloads. This PR renames that provider-specific
field to `realtime_session_id` / `realtimeSessionId` and intentionally
breaks clients that still send the old field names.

## What Changed

- Renamed realtime provider session fields in `ConversationStartParams`,
`RealtimeConversationStartedEvent`, and `RealtimeEvent::SessionUpdated`.
- Renamed app-server v2 realtime request and notification fields to
`realtimeSessionId`.
- Removed legacy serde aliases for `session_id` / `sessionId`; clients
must send the new names.
- Propagated the rename through core realtime startup, app-server
adapters, codex-api websocket handling, and TUI realtime state.
- Regenerated app-server protocol schema/TypeScript outputs and updated
app-server README examples.
- Kept upstream Realtime API concepts unchanged: provider `session.id`
parsing and `x-session-id` headers still use the upstream wire names.

## Testing

- CI is running on the latest pushed commit.
- Earlier local verification on this PR:
  - `cargo test -p codex-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-core
realtime_conversation`
  - `cargo test -p codex-app-server-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-app-server
realtime_conversation`
- attempted `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-tui` (local
linker bus error while linking the test binary)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 13:39:48 +03:00
jif-oai
c37f7434ba Gate multi-agent v2 tools independently of collab (#20246)
## Why

`multi_agents_v2` is meant to be independently gated from the older
`collab` feature. The tool registry still treated the
collaboration-style agent tools as `collab`-only, so enabling
`multi_agents_v2` without `collab` omitted the v2 agent tools. Review
and guardian sub-sessions also need to keep agent spawning disabled even
when the outer session has `multi_agents_v2` enabled.

## What changed

- Include the collab-backed agent tools when either `multi_agents_v2` or
`collab` is enabled.
- Explicitly disable `multi_agents_v2` for review and guardian review
sub-sessions, matching the existing `spawn_csv` and `collab`
restrictions.
- Add a registry test that enables `multi_agents_v2`, disables `collab`,
and verifies the v2 agent tools are present while legacy `send_input`
and `resume_agent` remain hidden.

## Testing

- Added
`test_build_specs_multi_agent_v2_does_not_require_collab_feature`.
2026-04-30 10:23:31 +02:00
Eric Traut
a73403a890 Make missing config clears no-ops (#20334)
## Why

Fixes #20145.

`config/value/write` treats a JSON `null` value as a request to clear
the config key. Clearing a key that is already absent should be
idempotent, but clearing a nested key such as `features.personality`
from an empty `config.toml` returned `configPathNotFound` because
`clear_path` treated the missing `features` parent table as an error.

That makes app-server reset flows brittle because clients have to read
first and avoid sending a clear request unless the parent path already
exists.

## What Changed

- Updated app-server config clearing so missing intermediate tables, or
non-table parents, are treated as an unchanged no-op.
- Removed the now-unreachable `MergeError::PathNotFound` path from
config write merging.
- Added a regression test covering `features.personality = null` against
an empty user config.

## Verification

- `cargo test -p codex-app-server clear_missing_nested_config_is_noop`
- `cargo test -p codex-app-server` was run; the config manager unit
suite passed, but one unrelated integration test failed because
`turn_start_emits_thread_scoped_warning_notification_for_trimmed_skills`
expected `7` trimmed skills and observed `8`.
- `just fix -p codex-app-server`
2026-04-30 10:13:33 +02:00
xl-openai
87d0cf1a62 feat: Add workspace plugin sharing APIs (#20278)
1. Adds v2 plugin/share/save, plugin/share/list, and plugin/share/delete
RPCs.
2. Implements save by archiving a local plugin root, enforcing a size
limit, uploading through the workspace upload flow, and supporting
updates via remotePluginId.
3. Lists created workspace plugins
4. Deletes a previously uploaded/shared plugin.
2026-04-29 23:49:20 -07:00
Michael Bolin
ae863e72a2 ci: increase Windows release workflow timeouts (#20343)
## Why

#20271 increased the `90`-minute timeout in `rust-release.yml`, but it
did not update the reusable Windows workflow in
`rust-release-windows.yml`. As a result, the Windows release compile
jobs were still capped at `60` minutes and the `windows-x64` primary
build could continue timing out.

We are keeping the existing `90`-minute timeout in `rust-release.yml`.
That increase was still directionally correct because the top-level
release build benefits from extra headroom; the mistake was assuming it
also covered the reusable Windows jobs.

## What Changed
- increase the reusable Windows release workflow timeouts in
`rust-release-windows.yml` from `60` minutes to `90` minutes
- update the comment in `rust-release.yml` so it no longer implies that
the top-level timeout covers the Windows reusable jobs
2026-04-29 23:27:04 -07:00
Abhinav
8f3c06cc97 Add persisted hook enablement state (#19840)
## Why

After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.

## What

- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.

## Stack

1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882

## Reviewer Notes

The generated schema files account for much of the raw diff. The core
behavior is in:

- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 04:46:32 +00:00
Michael Bolin
ac4332c05b permissions: expose active profile metadata (#20095) 2026-04-29 20:54:59 -07:00
Matthew Zeng
ebe602d005 [plugins] Allow MSFT curated plugins in tool_suggest (#20304)
## Summary
- [x] Move the allowlist out of core crate
- [x] Add Teams, SharePoint, Outlook Email, and Outlook Calendar to the
tool_suggest discoverable plugin allowlist
- [x] Add focused coverage for Microsoft curated plugin discovery

## Testing
- just fmt
- cargo test -p codex-core-plugins
- cargo test -p codex-core
list_tool_suggest_discoverable_plugins_returns_
2026-04-29 19:45:52 -07:00
pakrym-oai
4e677d62da app-server: remove dead api version handling from bespoke events (#20291)
Remove ApiVersion::V1
2026-04-30 01:55:44 +00:00
rhan-oai
bb536d65bd [codex-analytics] prevent stale guardian events from satisfying reused reviews (#20080)
## Why

Reused Guardian review trunks can still have older child-turn events
queued when a later review starts. The review waiter currently accepts
the first terminal event it sees from the shared child session, so a
stale `TurnComplete` can be attributed to the new review. That produces
impossible analytics combinations such as non-null TTFT with sub-10 ms
completion latency and zero token deltas on `trunk_reused` reviews.

## What changed

- Preserve the child turn id returned by the Guardian review
`Op::UserTurn` submission.
- Restrict Guardian review waiting to events correlated with that
submitted child turn.
- Restrict timeout/abort draining to terminal events for the same child
turn.
- Add regression coverage for stale prior-turn completions, stale
prior-turn errors, and interrupt draining in
`codex-rs/core/src/guardian/review_session.rs`.

## Verification

- `cargo test -p codex-core guardian::review_session::tests::`
- `cargo clippy -p codex-core --tests -- -D warnings`
2026-04-29 18:26:39 -07:00
Alex Zamoshchin
8b07132e09 update codex_plugins_beta_setting (from workspace settings) (#20250)
update the name after rename internally

see https://github.com/openai/openai/pull/871006
2026-04-30 00:40:25 +00:00
Eric Traut
515aa9a4fb tui: return from side chat on Ctrl-D (#20282)
## Why

Fixes #20264.

Side conversations are an ephemeral layer on top of the main chat.
Pressing `Ctrl+D` from an empty side-chat composer should unwind back to
the parent thread, matching the existing side-return behavior, instead
of falling through to the global quit shortcut and exiting Codex.

## What changed

The side-return shortcut matcher now treats `Ctrl+D` the same way it
already treats `Esc` and `Ctrl+C`. Because app-level side-return
handling runs before the chat widget's global quit handling, this
returns from `/side` while preserving normal `Ctrl+D` quit behavior
outside side conversations.

The existing shortcut coverage was updated to include lowercase and
uppercase `Ctrl+D` key events.

## Verification

- `cargo test -p codex-tui
side_return_shortcuts_match_esc_ctrl_c_and_ctrl_d`
- `cargo test -p codex-tui` starts successfully and the new shortcut
test passes, but the broader suite later aborts in the unrelated
existing test
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`
with a stack overflow.
2026-04-29 17:26:11 -07:00
pakrym-oai
fedcefe9da Reduce the surface of collaboration modes (#20149)
Collaboration modes were slightly invasive both into ThreadManager
construction and ModelProvider
2026-04-29 17:22:41 -07:00
stefanstokic-oai
c8abcbf925 Import external agent sessions in background (#20284)
Summary:
- Return from external agent import before session history import
finishes
- Run session import work in the background and emit the existing
completion notification when it is done
- Serialize session imports so duplicate requests do not create
duplicate imported threads

Verification:
- cargo test -p codex-app-server external_agent_config_
- cargo test -p codex-external-agent-sessions
- just fix -p codex-app-server
- just fix -p codex-external-agent-sessions
- git diff --check
2026-04-30 00:00:41 +00:00
alexsong-oai
7bcd4626c4 Consume ai-title from external sessions and add end marker (#20261)
## Summary
- Support Claude Code `ai-title` / `aiTitle` records when detecting and
importing external agent sessions.
- Preserve existing `custom-title` / `customTitle` precedence; only fall
back to `aiTitle` when no custom title is present.
- Add coverage for both detection and import title selection, including
the custom-title-over-ai-title case.

## Testing
- `cargo test -p codex-external-agent-sessions`
- `just fix -p codex-external-agent-sessions`
2026-04-30 00:00:13 +00:00
Abhinav
8774229a89 Add hooks/list app-server RPC (#19778)
## Why

We need a way to list the available hooks to expose via the TUI and App
so users can view and manage their hooks

## What

- Adds `hooks/list` for one or more `cwd` values that returns discovered
hook metadata

## Stack

1. openai/codex#19705
2. This PR - openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882

## Review Notes

The generated schema files account for most of the raw diff, these files
have the core change:

- `hooks/src/engine/discovery.rs` builds the inventory entries during
hook discovery while leaving runtime handlers focused on execution.
- `app-server/src/codex_message_processor.rs` wires `hooks/list` into
the app-server flow for each requested `cwd`.
- `app-server-protocol/src/protocol/v2.rs` defines the new v2
request/response payloads exposed on the wire.

### Core Changes

`core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
`skills/list` and `hooks/list`can resolve plugin state for each
requested `cwd`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-29 23:39:57 +00:00
Michael Bolin
6eab7519b4 chore: increase release build timeout from 60 min to 90 (#20271)
Build times are creeping up, so increase the timeout as a precaution.
2026-04-29 16:19:59 -07:00
rafael-jac
98f67b15d3 Update Codex login success page UX (#20136)
## Summary

update the local login success page to match the Codex desktop auth UX
use theme-aware colors and an inline 20px Codex mark
keep the actual localhost success page aligned with the browser auth UX
PR

## Tests

<img width="1728" height="1117" alt="Screenshot 2026-04-29 at 12 00
34 PM"
src="https://github.com/user-attachments/assets/76a40c3f-07c3-452c-97da-e7c43717cd2c"
/>
2026-04-29 19:14:53 -04:00
evawong-oai
74f06dcdfb Enforce workspace metadata protections in Linux sandbox (#19852)
## Summary

Enforce FileSystemSandboxPolicy protected metadata names in the Linux
bubblewrap adapter so `.git`, `.agents`, and `.codex` remain read only
inside writable workspace roots unless the policy grants an explicit
write carveout.

## Scope

1. Translate protected metadata names from FileSystemSandboxPolicy into
bubblewrap masks for existing metadata paths.
2. Represent missing protected metadata paths as guarded mount targets
so agents cannot create `.git`, `.agents`, or `.codex` under writable
roots.
3. Preserve normal git discovery for existing repos, worktrees, and
parent repos.
4. Keep explicit user write grants working when policy allows a
protected metadata path directly.

## Not in scope

1. No shell preflight UX.
2. No TUI runtime profile propagation.
3. No macOS Seatbelt changes in this PR.

## Reviewer focus

1. This should be reviewed as the Linux enforcement adapter for the
policy primitive from PR 19846.
2. macOS enforcement already landed in PR 19847.
3. The important invariant is that `FileSystemSandboxPolicy` is the
source of truth for `.git`, `.agents`, and `.codex`.

## Validation

1. `git diff` whitespace check passed.
2. `cargo fmt` check passed with the existing stable rustfmt warning
about `imports_granularity`.
3. Full Linux sandbox Cargo test suite passed on the devbox.
4. Devbox forty six case suite passed at head
`012accb703c13bd28df5b40079a9bf183036336a`.
5. Devbox summary: pass 46, fail 0.
6. The devbox suite was run through `just c sandbox linux`.
7. Focused repo test for Viyat parent repo case passed on the devbox.
2026-04-29 16:14:14 -07:00
iceweasel-oai
13dbcda28f stop blocking unified_exec on Windows (#19435)
## Summary
- remove the Windows-specific unified-exec environment block from tool
selection
- keep `unified_exec` default-off on Windows unless the feature is
explicitly enabled
- normalize model-provided `shell_type = unified_exec` to
`shell_command` when the feature is disabled
- drop obsolete tests tied to the removed environment gate and keep the
feature-flag regression coverage

## Why
Now that the session/long-lived process backend is implemented for the
Windows sandbox, we don't need to hard disable it anymore. We will be
rolling out slowly using a feature gate.

## Impact
This allows manual Windows opt-in in CLI and app-backed flows while
preserving the existing default-off behavior for Windows users.

---------

Co-authored-by: canvrno-oai <kbond@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-29 16:06:33 -07:00
pakrym-oai
8de2a7a16d Add codex-core public API listing (#20243)
Summary:
- Add a checked-in codex-core public API listing generated by
cargo-public-api.
- Add scripts/regen-public-api.sh with an embedded crate list,
auto-install for cargo-public-api 0.51.0, pinned nightly, and --check
mode.
- Add Rust CI jobs on the codex Linux x64 runner pool to verify the
listing stays up to date.

Testing:
- bash -n scripts/regen-public-api.sh
- just regen-public-api --check
- yq '.' .github/workflows/rust-ci.yml
.github/workflows/rust-ci-full.yml
- git diff --check
2026-04-29 22:58:08 +00:00
Rasmus Rygaard
782191547c Add agent graph store interface (#19229)
## Summary

Persisted subagent parent/child topology currently leaks through
`StateRuntime`'s SQLite-specific thread-spawn helpers. This PR
introduces a narrow `AgentGraphStore` boundary so follow-up work can
route graph operations through a local or remote store without coupling
orchestration code directly to the state DB graph API.

## Changes

- Adds the new `codex-agent-graph-store` crate.
- Defines a flat `AgentGraphStore` trait for the v1 graph surface:
upsert edge, set edge status, list direct children, and list
descendants.
- Adds public graph types for `ThreadSpawnEdgeStatus`,
`AgentGraphStoreError`, and `AgentGraphStoreResult`.
- Implements `LocalAgentGraphStore` on top of an existing
`codex_state::StateRuntime`, preserving today's SQLite-backed
`thread_spawn_edges` behavior.
- Registers the crate in Cargo/Bazel metadata.

This PR only adds the local contract and implementation; call-site
migration and the remote gRPC store are left to the follow-up PRs in the
stack.

## Testing

- `cargo test -p codex-agent-graph-store`

The new unit tests cover local parity with the existing `StateRuntime`
graph methods, `Open`/`Closed` filtering, status updates, and stable
breadth-first descendant ordering.
2026-04-29 22:48:26 +00:00
Matthew Zeng
e20391e567 [mcp] Fix plugin MCP approval policy. (#19537)
Plugin MCP servers are loaded from plugin manifests rather than
top-level `[mcp_servers]`, so their tool approval preferences need to be
stored and applied through the owning plugin config. Without this,
choosing "Always allow" for a plugin MCP tool could write a preference
that was not reliably used on later tool calls.

## Summary
- Add plugin-scoped MCP policy config under
`plugins.<plugin>.mcp_servers`, including server enablement, tool
allow/deny lists, server defaults, and per-tool approval modes.
- Overlay plugin MCP policy onto manifest-provided server configs when
plugins are loaded.
- Route persistent "Always allow" writes for plugin MCP tools back to
the owning `plugins.<plugin>.mcp_servers.<server>.tools.<tool>` config
entry.
- Reload user config after persisting an approval and make the plugin
load cache config-aware so stale plugin MCP policy is not reused after
`config.toml` changes.
- Regenerate the config schema and add coverage for plugin MCP policy
loading, approval lookup, persistence, and stale-cache prevention.

## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-core --lib plugin_mcp`
2026-04-29 15:40:03 -07:00
Eric Traut
4241df4d79 Escape turn metadata headers as ASCII JSON (#19620)
## Why

`x-codex-turn-metadata` is sent as an HTTP/WebSocket header, but Codex
was serializing the metadata JSON with raw UTF-8 string contents. When a
workspace path contains non-ASCII characters, common HTTP stacks can
reject or corrupt that header before the request reaches the provider.

Fixes #17468. Also addresses the duplicate WebSocket report in #19581.

## What changed

- Added `codex_utils_string::to_ascii_json_string`, a shared helper that
serializes JSON normally while escaping non-ASCII string content as
`\uXXXX`.
- Switched turn metadata header serialization, including merged
Responses API client metadata, to use the ASCII-safe JSON helper.
- Added coverage for non-ASCII workspace paths and non-ASCII client
metadata while preserving the same parsed JSON values.

## Verification

- `cargo test -p codex-utils-string`
- `cargo test -p codex-core turn_metadata`
- `just bazel-lock-check`
2026-04-29 15:35:33 -07:00
Michael Bolin
b1546008fc docs: discourage #[async_trait] and #[allow(async_fn_in_trait)] (#20242)
## Why

We have run into two avoidable problems when introducing async trait
APIs in Rust:

- `#[async_trait]` has caused materially worse build times in this
repository.
- `#[allow(async_fn_in_trait)]` makes it too easy to ship a public trait
without spelling out whether the returned future is `Send`, which hides
an important part of the trait contract.

We already have a good example of the preferred alternative in
[#16630](https://github.com/openai/codex/pull/16630) /
[`3c7f013f9735`](https://github.com/openai/codex/commit/3c7f013f9735),
but that guidance currently lives only as prior art in the codebase.
This PR documents the rule in `AGENTS.md` so contributors are more
likely to follow the native RPITIT pattern before these two shortcuts
spread further.

## What Changed

- added Rust guidance in `AGENTS.md` discouraging both `#[async_trait]`
and `#[allow(async_fn_in_trait)]`
- pointed contributors to the native RPITIT pattern with explicit `Send`
bounds on the returned future
- clarified that implementations may still use `async fn` when they
satisfy that trait contract

## Verification

- docs-only change; no tests run
2026-04-29 15:29:29 -07:00
Alex Daley
f63b19bedd [apps] Add apps MCP path override (#20231)
Summary

- Add `[features.apps_mcp_path_override]` config with a `path` field for
overriding only the built-in apps MCP path.
- Keep existing host/base URL derivation unchanged and append the
configured path after that base.
- Regenerate the config schema with the custom feature-config case.

Test Plan

- Not run for latest revision; only `just fmt` and `just
write-config-schema` were run.
- Earlier revision: `cargo test -p codex-features`
- Earlier revision: `cargo test -p codex-mcp`
2026-04-29 18:08:06 -04:00
xli-oai
8d5da3ffe5 Fallback login callback port when default is busy (#19334)
## Summary
- Keep the preferred ChatGPT login callback port `1455` first.
- Preserve the existing `/cancel` recovery for stale Codex login
servers.
- Fall back to the registered localhost callback port `1457` when `1455`
remains unavailable.

## Why
Cursor and Codex Desktop both use the ChatGPT account login callback
server. On Windows, Cursor can already be listening on `127.0.0.1:1455`
/ `[::1]:1455`, causing Codex Desktop sign-in to fail with:

`Local callback port 1455 is already in use on this machine.`

Codex already attempted to cancel a stale Codex login server on that
port, but if the listener does not release the port, the old behavior
was to fail. The new behavior falls back to `1457`, which matches the
fixed redirect URI being registered server-side in
`openai/openai#863817`. This keeps the OAuth `redirect_uri` inside
Hydra's exact allow-list instead of choosing an arbitrary ephemeral
port.

## Validation
- `just fmt`
- `cargo test -p codex-login`
- `git diff --check HEAD~1..HEAD`
2026-04-29 14:45:27 -07:00
rhan-oai
72a39e3a96 [app-server] centralize client response analytics (#20059)
## Why

The precursor PR keeps successful client responses typed until
app-server's outgoing response seam. This follow-up uses that seam to
move successful client-response analytics out of individual handlers and
into the shared sender path, while keeping filtering decisions inside
`codex-analytics`.

## What changed

- Emit successful client-response analytics centrally from
`OutgoingMessageSender::send_response`.
- Remove duplicate handler-local response tracking for the current
thread/turn lifecycle responses.
- Keep analytics ingestion selective inside `AnalyticsEventsClient`, so
unrelated client traffic is ignored before cloning or boxing.
- Collapse client-response analytics facts onto one typed path and
normalize payloads in the reducer.
- Add direct client-filter coverage plus sender-level coverage for the
centralized forwarding path.

## Verification

- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests --lib`
2026-04-29 21:22:39 +00:00
xli-oai
afbddabc8b Require remote plugin detail before uninstall (#19966)
## Summary
- Fetch remote plugin detail before sending the uninstall request.
- Use the detail response to derive the marketplace namespace and plugin
name for cache cleanup.
- Stop the uninstall before the backend POST if detail lookup fails, so
backend state and local cache state do not diverge.

## Testing
- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo test -p codex-core-plugins`
- `git diff --check`
2026-04-29 14:01:11 -07:00
rhan-oai
973c5c823e [app-server] type client response payloads (#20050)
## Why

`pr17088` adds typed server-originated request/response plumbing, but
successful client responses are still erased into bare JSON-RPC `result`
values before app-server can make any typed decision about them.

This precursor PR keeps successful client responses typed until the
outgoing response seam. It is intentionally limited to
protocol/app-server plumbing so the analytics behavior change can review
separately on top.

## What changed

- Add `ClientResponsePayload` as the pre-serialization client response
body type.
- Route app-server successful response paths through the typed payload
seam while preserving existing handler-local analytics behavior.
- Keep `InterruptConversation` JSON-RPC-only because it has no
`ClientResponse` variant.
- Move the new payload conversion tests into a dedicated protocol test
module.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server-protocol`
2026-04-29 20:50:47 +00:00
sayan-oai
b15074d0a4 app-server: fix outgoing sender test setup (#20258)
## Why

[#17088](https://github.com/openai/codex/pull/17088) changed
`OutgoingMessageSender::new` to require an `AnalyticsEventsClient`, but
one `command_exec` test added earlier on `main` still called the old
one-argument constructor. That leaves current `main` failing to compile
in Bazel and argument-comment-lint jobs.

## What changed

- Pass `AnalyticsEventsClient::disabled()` to the missed
`OutgoingMessageSender::new` test call site in `command_exec.rs`.

## Verification

- `cargo test -p codex-app-server
timeout_or_cancellation_reports_cancellation_without_timeout_exit_code`
2026-04-29 20:47:20 +00:00
Matthew Zeng
8ce48f9968 [tool_suggest] Improve tool_suggest triggering conditions. (#20091)
## Summary
- Tighten `tool_suggest` guidance so it prefers explicit plugin install
requests, while still allowing a connector install when the relevant
plugin is already installed and a needed connector from that plugin is
missing.
- Tell the model not to call `tool_suggest` in parallel with other
tools.

## Testing
- `cargo test -p codex-tools tool_suggest`
- `cargo test -p codex-core tool_suggest`
2026-04-29 13:41:12 -07:00
rhan-oai
0690ab0842 [codex-analytics] ingest server requests and responses (#17088)
## Why

Codex analytics needs a typed seam for app-server-originated
request/response traffic so future tool-approval analytics can consume
those facts without adding bespoke callsite tracking each time. Server
responses arrive as JSON-RPC `id + result` payloads, so analytics has to
reconstruct the matching typed response from the original typed request
while that request context still exists in app-server.

This also puts analytics on the app-server outbound path, which needs to
avoid keeping the runtime alive during shutdown. The final ownership fix
keeps the normal strong auth-manager retention in analytics and makes
the external-auth refresh bridge hold a weak back-reference to
`OutgoingMessageSender`, breaking the runtime cycle at the bridge
boundary instead of exposing retention policy through the analytics
client API.

## What changed

- Adds typed `ServerRequest` and `ServerResponse` analytics facts, plus
`AnalyticsEventsClient::track_server_request` and
`track_server_response`.
- Renames the existing client-side facts to `ClientRequest` and
`ClientResponse` so reducers can distinguish client-to-server traffic
from server-to-client traffic.
- Adds `ServerRequest::response_from_result`, allowing a stored typed
request to decode the matching typed server response from a raw JSON-RPC
result payload.
- Threads `AnalyticsEventsClient` through `OutgoingMessageSender` and
records targeted server requests, replayed targeted requests, and
matching targeted responses with the responding connection id needed for
correlation.
- Intentionally leaves broadcast server requests/responses out of
analytics for now because the current model is per connection, while
broadcasts fan one logical request out across multiple connections.
- Breaks the app-server shutdown cycle by storing
`Weak<OutgoingMessageSender>` in `ExternalAuthRefreshBridge` and
upgrading it only when an external-auth refresh is actually requested.
- Keeps reducer ingestion of the new server-side facts as no-ops for
now; this PR is plumbing for later tool-approval analytics work.

## Verification

- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests::`
- Covers typed-response reconstruction plus the targeted, replayed,
broadcast-exclusion, and response-attribution analytics paths.

## Follow-up

This PR intentionally stops at ingestion plumbing, so `ServerRequest`
and `ServerResponse` facts are still reducer no-ops. Once a follow-up PR
adds real downstream analytics output for those facts:

- replace the temporary pre-reducer observation seam with reducer tests
for the emitted event shape;
- add end-to-end coverage in `app-server/tests/suite/v2/analytics.rs`
for the real app-server workflow and captured analytics payload;
- remove the temporary sender-level observer tests added here in favor
of the real-output coverage above.

---

[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17088).
* #18748
* #18747
* #17090
* #17089
* #20241
* #20239
* __->__ #17088
2026-04-29 19:56:41 +00:00
iceweasel-oai
9d1e5df4b2 expand the set of core shell env vars for Windows. (#20089)
https://github.com/openai/codex/issues/13917 and
https://github.com/openai/codex/issues/18248 correctly identify that

```
[shell_environment_policy]
inherit = "core"
```
is not functional on Windows because it carries an insufficient set of
env vars.
This PR expands that to match the more functional set from the MCP
client
2026-04-29 19:23:46 +00:00
viyatb-oai
07c8b8c77c fix: handle deferred network proxy denials (#19184)
## Why

This bug is exposed by Guardian/auto-review approvals. With the managed
network proxy enabled, a blocked network request can be reported back
through the network approval service as an approval denial after the
command has already started. Before this change, the shell and unified
exec runtimes registered those network approval calls, but did not have
a way to observe an async proxy denial as a cancellation/failure signal
for the running process.

The result was confusing: Guardian/auto-review could correctly deny
network access, but the command path could keep running or unregister
the approval without surfacing the denial as the command failure.

## What Changed

- `NetworkApprovalService` now attaches a cancellation token to active
and deferred network approvals.
- Proxy-denial outcomes are recorded only for active registrations,
cancel the owning token, and are consumed when the approval is
finalized.
- The shell runtime combines the normal command timeout with the
network-denial cancellation token.
- Unified exec stores the deferred network approval object, terminates
tracked processes when the proxy denial arrives, and returns the denial
as a process failure while polling or completing the process.
- Tool orchestration passes the active network approval cancellation
token into the sandbox attempt and preserves deferred approval errors
instead of silently unregistering them.
- App-server `command/exec` now handles the combined
timeout-or-cancellation expiration variant used by the runtime.

## Verification

- `cargo test -p codex-core network_approval --lib`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `cargo clippy -p codex-core --all-targets -- -D warnings`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-29 19:13:57 +00:00
xl-openai
73cd831952 feat: Use remote installed plugin cache for skills and MCP (#20096)
- Fetches and caches remote /installed plugin state
- Lets skills/list load skills from remote-installed cached plugins
without requiring a local marketplace entry
- Routes plugin list/startup/install/uninstall changes through async
plugin cache invalidation and MCP refresh
2026-04-29 12:09:49 -07:00
Won Park
5cf0adba93 Include auto-review rollout in feedback uploads (#20064)
## Summary

- include the live auto-review trunk rollout when `/feedback` uploads
logs
- upload that attachment as
`auto-review-rollout-<parent-thread-id>.jsonl` so it is distinguishable
from the parent rollout
- show the same auto-review attachment name in the TUI consent popup

## Scope

- this only covers the live cached auto-review trunk for the current
parent thread
- it does not add durable historical parent->auto-review lookup
- it does not add persisted rollout support for ephemeral parallel
review forks

## UI 

<img width="599" height="185" alt="Screenshot 2026-04-28 at 1 17 18 PM"
src="https://github.com/user-attachments/assets/6a0e79c2-5d21-4702-8a89-f765778bc9e9"
/>

## Validation

- `cargo test -p codex-core
cached_guardian_subagent_exposes_its_rollout_path`
- `cargo test -p codex-feedback`
- `cargo test -p codex-app-server`
- `cargo test -p codex-tui feedback_upload_consent_popup_snapshot`
- `cargo test -p codex-tui
feedback_good_result_consent_popup_includes_connectivity_diagnostics_filename`

## Known unrelated local failures

- `cargo test -p codex-core` currently fails in the pre-existing proxy
env snapshot test
`tools::runtimes::tests::maybe_wrap_shell_lc_with_snapshot_keeps_user_proxy_env_when_proxy_inactive`
- `cargo test -p codex-tui` currently hits pre-existing `status::*`
snapshot drift unrelated to this change

## Follow-Up 
- persist parallel auto-review fork sessions so /feedback can include
their rollout history too
- attach each persisted fork as its own clearly named file, for example
auto-review-rollout-<parent-thread-id>-fork <n>.jsonl, instead of
merging multiple Guardian sessions into one attachment
- keep the same live-session-only scope initially; durable historical
parent -> auto-review lookup can remain a separate decision if we later
need feedback from resumed sessions
2026-04-29 11:44:55 -07:00
friel-openai
05fd904572 test protocol: lock inter-agent commentary phase (#20046)
## Summary
- add a regression test for
`InterAgentCommunication::to_response_input_item`
- assert replayed inter-agent messages keep `phase:
Some(MessagePhase::Commentary)`

## Test plan
- `cargo test -p codex-protocol`
- `just argument-comment-lint`
2026-04-29 11:24:17 -07:00
pakrym-oai
8356806fc9 Add ThreadManager sample crate (#20141)
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.

Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check

Tests: Not run per request.
2026-04-29 11:21:06 -07:00
joeytrasatti-openai
47fba5df4a [codex-backend] Prefer sqlite git info for rollout-path reads (#20228)
### Summary

- Path-based local thread reads currently return rollout/session git
metadata directly, so `thread/resume` can disagree with persisted SQLite
metadata for the same thread.
- Merge non-null SQLite git fields over rollout-path reads while keeping
rollout values as fallbacks for fields SQLite does not know.
- Add focused regression coverage for rollout-path reads so persisted
branch updates are preserved during resume.

### Testing

- `cargo test -p codex-thread-store`
2026-04-29 17:54:37 +00:00
Eric Traut
d0204c3dcc TUI: Remove core protocol dependency [3/7] (#20174)
## Why

This is part 3 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.

With `AppCommand` now explicit, the internal app event bus can carry TUI
commands directly instead of bouncing through core `Op` values.

## What changed

- Changed `AppEvent::CodexOp` and `AppEvent::SubmitThreadOp` to carry
`AppCommand`.
- Updated app-event senders and direct emitters to submit `AppCommand`
values.
- Adjusted tests to match `AppCommand` or convert back through
`into_core()` where they intentionally assert legacy payload equality.

## Verification

- `cargo test -p codex-tui --no-run`
2026-04-29 10:52:10 -07:00
Eric Traut
445629815c TUI: Remove core protocol dependency [2/7] (#20173)
## Why

This is part 2 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.

Before the TUI event bus can stop carrying core `Op` values,
`AppCommand` needs to be an owned TUI command shape rather than a thin
wrapper around `Op`.

## What changed

- Replaced the opaque `AppCommand(Op)` wrapper with explicit owned
variants for the commands the TUI submits.
- Preserved `into_core()` so this layer does not yet change the
app/thread submission boundary.
- Kept existing core leaf types for now so this remains a mechanical
command-shape refactor.

## Verification

- `cargo check -p codex-tui`
2026-04-29 10:28:04 -07:00
cassirer-openai
df966996a7 [rollout-tracer] Match analysis messages on encrypted id. (#20123)
In some setups the summary or raw content can be dropped between
requests. This triggers a check in the reducer which expects that the
messages should remain identical between requests.

This PR relaxes the checks to only focus on the encrypted ID instead. It
also changes the reducer to keep the most rich version of the message
observed during the rollout (this ensures that we don't accidentally
lose the CoT nor summary when available).
2026-04-29 17:22:24 +00:00
iceweasel-oai
cecca5ae06 Improve Windows process management edge cases (#19211)
## Summary

Some improvements to Windows process-management issues from
https://github.com/openai/codex/pull/15578

- bound the elevated runner pipe-connect handshake instead of waiting
forever on blocking pipe connects
- terminate the spawned runner if that handshake fails, so timeout/error
paths do not leave a stray `codex-command-runner.exe`
- loop on partial `WriteFile` results when forwarding stdin in the
elevated runner, so input is not silently truncated
- fix the concrete HANDLE/SID cleanup paths in the runner setup code
- keep draining driver-backed stdout/stderr after exit until the backend
closes, instead of dropping the tail after a fixed 200ms grace period
- reuse `LocalSid` for SID ownership and add more explanatory comments
around the ownership/concurrency-sensitive code paths

## Why

The original PR fixed a lot of Windows session plumbing, but there were
still a few sharp process-lifecycle edges:

- some elevated runner handshakes could block forever
- the new timeout path could still orphan the spawned runner process
- stdin forwarding still assumed a single `WriteFile` consumed the whole
buffer
- a few raw HANDLE/SID error paths still leaked
- driver-backed output could still lose the last chunk of stdout/stderr
on slower backends

## Validation

- `cargo fmt -p codex-windows-sandbox -p codex-utils-pty`
- `cargo test -p codex-utils-pty`
- `cargo test -p codex-windows-sandbox finish_driver_spawn`
- `cargo test -p codex-windows-sandbox runner_`

Ran a local test matrix of unified-exec and shell_tool tests, all
passing
2026-04-29 10:00:01 -07:00
Eric Traut
1c420a90cd TUI: Remove core protocol dependency [1/7] (#20172)
## Why

This is part 1 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.

This first layer reduces the size of the later `chatwidget` diff by
mechanically moving MCP startup bookkeeping out of the central widget
file without changing the event shapes or behavior.

## What changed

- Extracted MCP startup status handling into
`tui/src/chatwidget/mcp_startup.rs`.
- Kept the existing core event types in place for this purely mechanical
move.
- Updated the MCP startup tests to import the moved test-only event
types directly.

## Verification

- `cargo test -p codex-tui chatwidget::tests::mcp_startup`
2026-04-29 09:10:22 -07:00
Eric Traut
91ca551df8 Use /goal resume for paused goals (#20082)
## Why

The paused goal statusline currently points users at `/goal` to unpause
a goal, but bare `/goal` is the summary command and does not change the
goal state. Instead of making `/goal` mutate state only when a goal is
paused, this gives the action an explicit command that reads naturally
in the UI.

## What Changed

- Replace `/goal unpause` with `/goal resume` for reactivating a paused
goal.
- Update the paused goal statusline and `/goal` summary copy to point at
`/goal resume`.
2026-04-29 08:56:02 -07:00
jif-oai
70ac0f123c Make multi-agent v2 ignore agents.max_depth (#20180)
## Why

`agents.max_depth` is a legacy multi-agent v1 guard. Multi-agent v2 uses
task-path routing and its own session/thread limits, so v2 should not
reject nested `spawn_agent` calls just because the thread-spawn depth
has reached the v1 maximum.

Keeping the v1 depth guard active in v2 prevents deeper task trees even
though the v2 path still needs the depth value only for lineage and
task-path metadata.

## What Changed

- Removed the depth-limit rejection from the multi-agent v2
`spawn_agent` handler while still computing child depth for lineage/path
metadata.
- Made the depth-based disabling of legacy `SpawnCsv`/`Collab` tools
apply only when `Feature::MultiAgentV2` is disabled.
- Added `multi_agent_v2_spawn_agent_ignores_configured_max_depth` to
cover a v2 child spawning another agent when `agent_max_depth = 1`,
while the existing v1 depth-limit tests continue to enforce the legacy
behavior.

## Verification

- `cargo test -p codex-core
multi_agent_v2_spawn_agent_ignores_configured_max_depth -- --nocapture`
- `cargo test -p codex-core depth_limit -- --nocapture`
- `cargo test -p codex-core tools::handlers::multi_agents::tests --
--nocapture`
2026-04-29 12:23:00 +02:00
jif-oai
c41b74c453 nit: drop old memories things (#20186)
Drop legacy code
2026-04-29 12:19:50 +02:00
iceweasel-oai
5cac3f896d Fix Windows pseudoconsole attribute handling for sandboxed PTY sessions (#20042)
## Summary
Fix the Windows sandbox PTY spawn path to pass the pseudoconsole handle
value directly into `UpdateProcThreadAttribute`.

## Why
Sandboxed `unified_exec` PTY sessions on Windows were failing during
child process startup with `0xc0000142` (`STATUS_DLL_INIT_FAILED`). In
practice this showed up as PowerShell DLL init popups when the sandboxed
background-terminal path tried to launch an interactive shell.

The root cause was that we were passing a pointer to a local `isize`
variable instead of the pseudoconsole handle value in the form Windows
expects for `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE`.

## Validation
- `cargo build -p codex-windows-sandbox --bins`
- Reproduced the real sandboxed `codex exec` flow with
`windows.sandbox_private_desktop=true`
- Verified a `tty=true` interactive session launched through the normal
PowerShell wrapper, printed `READY`, accepted follow-up stdin, and
exited cleanly
- Confirmed no new `0xc0000142` / `Application Popup` events appeared
after the successful repro
2026-04-29 11:59:45 +02:00
alexsong-oai
d92c909ee4 Fix migrated hook path rewriting (#20144)
## Summary
- Rewrite migrated external-agent hook commands by replacing the full
hook script path token instead of only the `.claude/hooks/` segment.
- Preserve quoting around the full rewritten target path so script names
with spaces, absolute paths, and shell operators/redirection continue to
work.
- Apply `.claude/settings.local.json` over `.claude/settings.json` for
config, MCP, and plugin migration so local scope matches Claude settings
precedence.
- Skip legacy command markdown without `description` frontmatter,
including README-style docs under `.claude/commands`.

## Root Cause
The previous hook rewrite handled `.claude/hooks/` as a substring
replacement. For absolute source commands, that left the original
project-root prefix before the newly quoted `.codex/hooks` directory,
producing invalid commands like
`project/'project/.codex/hooks'/script.sh`.

The migration also only used project `settings.json` for
config/MCP/plugin decisions, so local settings such as
`disabledMcpjsonServers` could be ignored even though Claude gives local
settings higher precedence than project settings.

## Validation
- `just fmt`
- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-app-server external_agent_config`
- `just fix -p codex-external-agent-migration`
- `just fix -p codex-app-server`
- `git diff --check`
2026-04-29 00:46:11 -07:00
viyatb-oai
5597925155 feat(cli): add sandbox profile config controls (#20118)
## Why

The explicit profile path from #20117 is meant for standalone testing,
but it still inherited the
shell cwd and all managed requirements implicitly. The pre-existing
launcher path even called out
that it did not support a separate cwd yet in

[`debug_sandbox.rs`](509453f688/codex-rs/cli/src/debug_sandbox.rs (L174-L179)).

For a standalone command, the useful default is to let the caller choose
the project directory being
tested and to avoid administrator-provided constraints unless the caller
explicitly wants to test
those too.

## What changed

- Add explicit-profile-only `-C/--cd DIR`, and use that cwd for both
profile resolution and command
  execution.
- Add explicit-profile-only `--include-managed-config`.
- Make explicit profile mode skip managed requirement sources by
default, including cloud
requirements, MDM requirements, `/etc/codex/requirements.toml`, and the
legacy managed-config
  requirements projection.
- Preserve all existing invocations outside the explicit-profile path.

## Stack

1. #20117 `sandbox-ui-profile`
2. #20118 `sandbox-ui-config` --> this PR

Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.

## Tests ran

- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_`
- `cargo test -p codex-core
load_config_layers_can_ignore_managed_requirements`
- `cargo test -p codex-core
load_config_layers_includes_cloud_requirements`
- macOS branch-binary smoke on the rebased top of stack: `-C` changed
execution cwd, explicit
profile mode omitted managed proxy env under `env -i`, and
`--include-managed-config` restored it.
- Linux devbox branch-binary smoke on the rebased top of stack: `-C`
changed execution cwd for
  built-in and user-defined explicit profiles.
2026-04-29 06:55:51 +00:00
Andrey Mishchenko
857146b328 Delete multi_agent_v2 followup_task interrupt parameter (#20139)
Messages sent with `followup_task` already arrive at their target
recipient promptly (at message boundaries while sampling, or after the
pending tool call completes) -- having `interrupt` is not worth the
added complexity.
2026-04-28 23:19:48 -07:00
viyatb-oai
6ed0440611 feat(cli): add explicit sandbox permission profiles (#20117)
## Why

`codex sandbox` is useful for exercising sandbox behavior directly, but
before this stack the CLI
only picked up permission profiles indirectly from the active config.
The existing debug-sandbox path
already compiled `[permissions]` profiles through normal config loading,
as covered by the existing
profile tests in
[`debug_sandbox.rs`](de2ccf9473/codex-rs/cli/src/debug_sandbox.rs (L715-L760)).

This adds the smallest stable entry point first: an explicit profile
selector that reuses the same
config machinery as normal Codex config, so standalone testing becomes
possible without changing
current no-selector behavior.

## What changed

- Add additive `--permissions-profile NAME` support to `codex sandbox
macos|linux|windows`.
- Resolve built-in and user-defined profile names by feeding
`default_permissions` through the
existing config compilation path instead of inventing a sandbox-only
parser.
- Make an explicit selector win over an ambient active profile's legacy
`sandbox_mode`.
- Keep the existing no-selector behavior unchanged.

## Stack

1. #20117 `sandbox-ui-profile` --> this PR
2. #20118 `sandbox-ui-config`

Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.

## Tests ran

- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_parses_permissions_profile`
- `cargo test -p codex-core
cli_override_takes_precedence_over_profile_sandbox_mode`
- macOS branch-binary smoke on the rebased top of stack: built-in
`:workspace` and user-defined
  profiles both executed successfully through `--permissions-profile`.
- Linux devbox branch-binary smoke on the rebased top of stack: built-in
`:workspace` and
user-defined profiles both executed successfully through
`--permissions-profile`.
2026-04-29 06:18:16 +00:00
Dylan Hurd
3d10ba9f36 chore(cli) deprecate --full-auto (#20133)
## Summary
Starts the process of getting rid of `--full-auto`, with some
concessions:
1. Fully removes the command from the tui, since it just resolves to the
default permissions there, and encourages users to use the one-time
trust flow if they're not in a trusted repo.
2. Marks the command as deprecated in `codex exec`, in case users are
actively relying on this. We'll remove in an upcoming n+X release.
3. Cleans up some of the `codex sandbox` cli logic, to keep supporting
legacy sandbox policies for now.

This isn't the cleanest setup, but I think it is worthwhile to warn
users for one release before hard-removing it.

## Testing 
- [x] Updated unit tests
2026-04-29 04:41:30 +00:00
starr-openai
e1ec9e63a0 Add environment provider snapshot (#20058)
## Summary
- Change `EnvironmentProvider` to return concrete `Environment`
instances instead of `EnvironmentConfigurations`.
- Make `DefaultEnvironmentProvider` provide the provider-visible `local`
environment plus optional `remote` environment from
`CODEX_EXEC_SERVER_URL`.
- Keep `EnvironmentManager` as the concrete cache while exposing its own
explicit local environment for `local_environment()` fallback paths.

## Validation
- `just fmt`
- `git diff --check`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 20:05:18 -07:00
xl-openai
6f328d5e02 Soften skill description budget warnings (#20112)
Updates skill description budget messaging to be less alarming
2026-04-28 19:56:25 -07:00
Michael Bolin
e6db1a9442 linux-sandbox: switch helper plumbing to PermissionProfile (#20106)
## Why

`PermissionProfile` is the canonical runtime permission model in the
Rust workspace, but the Linux sandbox helper still accepted a legacy
`SandboxPolicy` plus separate filesystem and network policy flags. That
translation layer made the helper interface harder to reason about and
left `linux-sandbox`-specific callers and tests coupled to the legacy
policy representation.

This change moves the helper onto `PermissionProfile` directly so the
Linux sandbox plumbing matches the rest of the permission stack.

## What changed

- changed `codex-linux-sandbox` to accept `--permission-profile` and
derive the runtime filesystem and network policies internally
- updated the in-process seccomp and legacy Landlock path in
`codex-rs/linux-sandbox` to operate on `PermissionProfile`
- updated Linux sandbox argv construction in `codex-rs/sandboxing`,
`codex-rs/core`, and the CLI debug sandbox path to pass the canonical
profile instead of serializing compatibility policy projections
- simplified the Linux sandbox tests to build the exact permission
profile under test, including the managed-proxy path and
direct-runtime-enforcement carveout coverage
- removed helper-local `SandboxPolicy` usage from `bwrap` tests where
`FileSystemSandboxPolicy` is already the value being exercised

## Testing

- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox` (on this macOS host, the crate
compiled cleanly and its Linux-only tests were cfg-gated)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-cli --no-run`
2026-04-28 19:43:44 -07:00
Celia Chen
80fb0704ee feat: update Bedrock Mantle endpoint and GPT-5.4 model ID (#20109)
## Summary

Amazon Bedrock Mantle's OpenAI-compatible endpoint now lives under
`/openai/v1`, and the GPT-5.4 Mantle model ID no longer uses the `-cmb`
suffix. This updates Codex's built-in Bedrock provider configuration so
generated providers and the static Bedrock catalog use the current
endpoint and model ID.

## Changes

- Update the Bedrock Mantle base URL from
`https://bedrock-mantle.{region}.api.aws/v1` to
`https://bedrock-mantle.{region}.api.aws/openai/v1`.
- Update the Amazon Bedrock default base URL in
`codex-model-provider-info`.
- Change the Bedrock GPT-5.4 catalog slug from `openai.gpt-5.4-cmb` to
`openai.gpt-5.4`.
- Align provider and catalog tests with the new URL and model ID.

## Test Plan

- Manual smoke test:

  ```shell
  target/debug/codex \
      -m openai.gpt-5.4 \
      -c 'model_provider="amazon-bedrock"' \
      -c 'model_providers.amazon-bedrock.aws.region="us-west-2"'
  ```
2026-04-29 01:37:21 +00:00
Celia Chen
8c47e36504 feat: expose provider capability bounds to app server clients (#20049)
follow up of #19442. The app server now exposes provider-derived bounds
through a new v2 `modelProvider/read` method. The response reports the
configured provider map key as `modelProvider` and returns the effective
capability booleans so clients can align their UI with the same
provider-owned limits used by core.
2026-04-29 01:36:19 +00:00
canvrno-oai
4c39ad33cb Fix plugin list workspace settings test isolation (#20086)
Fixes test that often fails locally when running `cargo test`
- Add an app-server test helper that combines managed-config isolation
with custom env overrides.
- Isolate `HOME` / `USERPROFILE` in plugin-list workspace settings tests
so host home marketplaces do not affect results.
2026-04-28 18:34:38 -07:00
canvrno-oai
24be9ac0a4 Restore TUI working status after steer message is set (#19939)
Fix for #19925

Restore the `Working` indicator after a streamed final answer finishes
when a user steer message is sent.
Add regression coverage for long output plus a mid-stream steer:
`cargo test -p codex-tui
final_answer_completion_restores_status_indicator_for_pending_steer`

Duplication/testing steps:
1. Start a new thread and ask for a long response.
2. While the response is streaming, submit a steer message.
3. When the first response finishes, observe whether `Working...` is
shown while waiting for the steer message response.
2026-04-28 18:10:40 -07:00
Michael Bolin
c9f7c88f3d fix: restore live event submit path for apply patch tests (#20108)
## Summary

This fixes the CI regression introduced by
[#20040](https://github.com/openai/codex/pull/20040).

That PR migrated several `apply_patch_cli` tests from direct
`codex.submit(Op::UserTurn { ... })` calls to `harness.submit(...)`.
`harness.submit()` waits for `TurnComplete` before returning, which
drains the same event stream that these tests use to assert `TurnDiff`,
`PatchApplyUpdated`, and related live events. The regressed tests then
timed out waiting for events that had already been consumed.

This change restores a no-wait submit path for the event-observing
`apply_patch_cli` tests so they can watch the turn stream directly
again.

## What Changed

- added a local `submit_without_wait(...)` helper in
`codex-rs/core/tests/suite/apply_patch_cli.rs`
- switched the `apply_patch_cli` tests that assert live turn events back
to that helper
- left the profile-backed `harness.submit(...)` migration in place for
tests that only care about final filesystem or tool output state

## Why macOS Looked Green

In the failing run
[25084487331](https://github.com/openai/codex/actions/runs/25084487331),
`//codex-rs/core:core-all-test` was cached on macOS, so the regressed
tests were not rerun there. The Linux GNU, Linux MUSL, and Windows Bazel
jobs reran the target and exposed the failure.

## Verification

- `cargo test -p codex-core apply_patch_ -- --nocapture`
- previously failing local cases now pass again:
  - `apply_patch_cli_move_without_content_change_has_no_turn_diff`
  - `apply_patch_turn_diff_for_rename_with_content_change`
  - `apply_patch_aggregates_diff_across_multiple_tool_calls`
2026-04-28 18:09:20 -07:00
Celia Chen
f8fe96d548 feat: disable capabilities by model provider (#19442)
## Why

Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.

This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.

## What changed

- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.

## Testing

built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
2026-04-28 17:51:30 -07:00
alexsong-oai
cb8b1bbcd6 Support detect and import MCP, Subagents, hooks, commands from external (#19949)
## Why
This PR expands the migration path so Codex can detect and import MCP
server config, hooks, commands, and subagents configs in a Codex-native
shape.

## What changed

- Added a `codex-external-agent-migration` crate that owns conversion
logic for external-agent MCP servers, hooks, commands, and subagents.
- Extended the app-server external-agent config detection/import API
with migration item types for MCP server config, hooks, commands, and
subagents.

## Migration strategy

The migration is intentionally conservative: Codex only imports
external-agent config that can be represented safely in Codex today.
Unsupported or ambiguous config is skipped instead of being partially
translated into behavior that may not match the source system.

- **MCP servers**: import supported stdio and HTTP MCP server
definitions into `mcp_servers`. Disabled servers and servers filtered
out by source `enabledMcpjsonServers` / `disabledMcpjsonServers` are
skipped. Project-scoped MCP entries from `.claude.json` are included
when they match the repo path.
- **Hooks**: import only supported command hooks into
`.codex/hooks.json`. Unsupported hook features such as conditional
groups, async handlers, prompt/http hooks, or unknown fields are
skipped. Referenced hook scripts are copied into `.codex/hooks/`,
preserving any existing target scripts.
- **Commands**: import supported external commands as Codex skills under
`.agents/skills/source-command-*`. Commands that rely on source runtime
expansion such as `$ARGUMENTS`, `$1`, `@file` references, shell
interpolation, or colliding generated names are skipped.
- **Subagents**: import valid subagent Markdown files into
`.codex/agents/*.toml` when they have the minimum Codex agent fields.
Source model names are not migrated, so imported agents keep the user’s
Codex default model; compatible reasoning effort and sandbox mode are
migrated when present.
- **Skills and project guidance**: copy missing skill directories into
`.agents/skills` and migrate `CLAUDE.md` guidance into `AGENTS.md`,
rewriting source-agent terminology to Codex terminology where
appropriate.
- **Detection details**: detected migration items include lightweight
details for UI preview, such as MCP server names, hook event names,
generated command skill names, and subagent names. Import still
recomputes from disk instead of trusting details as the source of truth.

- Adds focused coverage for the new migration behavior and app-server
import flow.

## Verification

- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server external_agent_config`
- `just bazel-lock-check`
2026-04-29 00:45:24 +00:00
Matthew Zeng
ebdf3a878c Support disabling tool suggest for specific tools. (#20072)
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression

## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
2026-04-29 00:19:34 +00:00
Michael Bolin
1211a90a35 core tests: migrate hook turns to profiles (#20041)
## Summary
- Removes `SandboxPolicy` from the hooks test suite.
- Submits hook-related turns with explicit `PermissionProfile` values
for disabled, read-only, and workspace-write cases.
- Preserves the managed-network hook test by configuring and submitting
a workspace-write profile with enabled network, allowing the existing
requirements-backed proxy path to remain covered.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:18:45 -07:00
Michael Bolin
1fed948c66 core tests: migrate apply patch turns to profiles (#20040)
## Summary
- Removes `SandboxPolicy` from the apply-patch CLI test suite.
- Uses the harness' profile-backed submit helper for danger/no-sandbox
turns instead of constructing `Op::UserTurn` manually with legacy
fields.
- Converts the workspace-write traversal cases to submit
`PermissionProfile::workspace_write_with(...)` directly.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:18:19 -07:00
Michael Bolin
1dae5788e1 core tests: migrate rmcp turns to profiles (#20037)
## Summary
- Removes `SandboxPolicy` from the RMCP client test suite.
- Adds shared read-only user-turn helpers that submit
`PermissionProfile::read_only()` plus the legacy compatibility
projection required by the current `Op::UserTurn` shape.
- Keeps sandbox metadata assertions intact by deriving the expected
legacy `sandboxPolicy` value from the same read-only profile used for
the turn.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:17:47 -07:00
Michael Bolin
6662c0f312 core tests: migrate compact turns to profiles (#20035)
## Summary
- Removes the remaining `SandboxPolicy` usage from the compaction test
suite.
- Adds a small local helper for direct `Op::UserTurn` construction so
these tests send `PermissionProfile::Disabled` plus the legacy
compatibility projection required by the protocol field.
- Keeps the existing danger/full-access behavior while exercising the
canonical permission profile path.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:17:12 -07:00
Michael Bolin
026df712cc core tests: migrate zsh-fork permissions to profiles (#20034)
## Summary
- Updates the zsh-fork test helper to configure `PermissionProfile`
directly instead of constructing a legacy `SandboxPolicy`.
- Sends permission-profile-backed turns from the skill approval zsh-fork
tests so the runtime and request path exercise the canonical permissions
model.
- Leaves the broader approvals suite on legacy policies for now, except
for the zsh-fork test that shares this helper.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:15:58 -07:00
Michael Bolin
1ea90410e1 core tests: migrate request permissions tool turns to profiles (#20033)
## Summary

This migrates the macOS request-permissions tool tests from legacy
`SandboxPolicy` setup to `PermissionProfile` setup. The tests still
exercise the same workspace-write baseline and request-permission
grants, but the canonical permissions value is now the profile.

## Changes

- Replaces the `workspace_write_excluding_tmp()` helper with a
`PermissionProfile::workspace_write_with()` helper.
- Applies test config through `Permissions::set_permission_profile()`.
- Uses `turn_permission_fields()` for `Op::UserTurn` compatibility
fields.
- Removes the `SandboxPolicy` import from `request_permissions_tool.rs`.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:15:13 -07:00
Michael Bolin
af39e488bc core tests: migrate prompt caching turns to profiles (#20032)
## Summary

This removes the explicit `SandboxPolicy` constructors from
`core/tests/suite/prompt_caching.rs`. The tests still exercise the same
prompt-cache invariants across permission and turn-context changes, but
the permission source is now `PermissionProfile`.

## Changes

- Uses `PermissionProfile::workspace_write_with()` for workspace-write
override scenarios.
- Uses `PermissionProfile::Disabled` for the no-sandbox per-turn
override.
- Projects profiles through `turn_permission_fields()` or
`to_legacy_sandbox_policy()` only to populate compatibility fields on
existing ops.
- Removes the `SandboxPolicy` import from `prompt_caching.rs`.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:13:53 -07:00
Michael Bolin
5d08315c00 core tests: migrate exec policy turns to profiles (#20030)
## Summary

This migrates `core/tests/suite/exec_policy.rs` away from legacy
`SandboxPolicy` turn construction. These tests all use no-sandbox turns
to exercise exec-policy behavior, so `PermissionProfile::Disabled` is
the canonical representation.

## Changes

- Replaces direct `SandboxPolicy::DangerFullAccess` turn fields with
`PermissionProfile::Disabled`.
- Uses `turn_permission_fields()` to populate the compatibility
`sandbox_policy` field required by `Op::UserTurn`.
- Removes the `SandboxPolicy` import from `exec_policy.rs`.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:12:48 -07:00
Michael Bolin
b599849d86 core tests: migrate permissions message tests to profiles (#20028)
## Summary

This removes another test-only `SandboxPolicy` dependency by configuring
`permissions_messages.rs` with a `PermissionProfile` directly. The test
still verifies the rendered compatibility permissions text, but now
obtains the legacy projection from the loaded `Config` rather than using
`SandboxPolicy` as the source of truth.

## Changes

- Builds the workspace-write test setup with
`PermissionProfile::workspace_write_with()`.
- Applies that profile through `Permissions::set_permission_profile()`.
- Uses `Config::legacy_sandbox_policy()` only for the expected
`PermissionsInstructions` compatibility rendering.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:12:10 -07:00
Michael Bolin
3ef09c71d3 core tests: migrate tools tests to permission profiles (#20027)
## Summary

This continues the test-side migration away from `SandboxPolicy` by
removing the remaining legacy policy setup in
`core/tests/suite/tools.rs`. The affected test was already modeling a
profile-backed filesystem policy with a deny-read glob, so configuring
the test through `Permissions::set_permission_profile()` is a better
match for the behavior being exercised.

## Changes

- Drops the `SandboxPolicy` import from `core/tests/suite/tools.rs`.
- Configures the glob deny-read shell test directly with a
`PermissionProfile` instead of creating a legacy read-only policy first.
- Submits the test turn with the session permission profile so the
deny-read glob remains active for the command under test.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:11:43 -07:00
Michael Bolin
8d3992d830 core tests: migrate plan item turns to profiles (#20026)
## Why

The core item tests still had a cluster of plan-mode `Op::UserTurn`
literals that used `SandboxPolicy::DangerFullAccess` and omitted
`permission_profile`. These tests are validating emitted item lifecycle
events, so keeping them on the legacy sandbox-only turn shape adds noise
to the broader permissions migration without testing legacy behavior.

## What Changed

- Adds a local `disabled_plan_turn()` helper that preserves the existing
`std::env::current_dir()` turn cwd behavior.
- Uses `turn_permission_fields(PermissionProfile::Disabled, cwd)` to
populate both the compatibility `sandbox_policy` and canonical
`permission_profile` fields.
- Replaces the plan-mode hand-built turns in
`codex-rs/core/tests/suite/items.rs`, removing all `SandboxPolicy`
references from that file and reducing remaining `codex-rs/core/tests`
`SandboxPolicy` files from 16 to 15.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:11:17 -07:00
Michael Bolin
162f4e3183 core tests: migrate safety check turns to profiles (#20024)
## Why

This stack is retiring direct `SandboxPolicy` construction from tests so
core coverage exercises the same `PermissionProfile` turn path used by
runtime code. `safety_check_downgrade.rs` still submitted each test turn
as `SandboxPolicy::DangerFullAccess` with no permission profile, even
though the tests are about model verification/reroute behavior rather
than legacy sandbox conversion.

## What Changed

- Adds a local `disabled_text_turn()` helper that derives both the
compatibility `sandbox_policy` and canonical `permission_profile` from
`PermissionProfile::Disabled`.
- Replaces repeated hand-built `Op::UserTurn` literals in
`codex-rs/core/tests/suite/safety_check_downgrade.rs` with that helper.
- Removes all `SandboxPolicy` references from the safety-check suite,
reducing the remaining `codex-rs/core/tests` files that mention
`SandboxPolicy` from 17 to 16.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:10:42 -07:00
Michael Bolin
2a8ce9b319 core tests: migrate view image turns to profiles (#20021)
## Why

This stack is removing direct `SandboxPolicy` usage from test code so
new tests exercise the same `PermissionProfile` path that runtime code
now treats as canonical. `view_image.rs` still built `Op::UserTurn`
requests with `SandboxPolicy::DangerFullAccess` and no permission
profile, which kept another core test module on the legacy turn shape.

## What Changed

- Adds a small `disabled_user_turn()` helper for the view-image suite
that derives the compatibility `sandbox_policy` and canonical
`permission_profile` from `PermissionProfile::Disabled`.
- Replaces repeated direct `Op::UserTurn` literals in
`codex-rs/core/tests/suite/view_image.rs` with that helper.
- Removes all `SandboxPolicy` references from `view_image.rs`, reducing
the remaining `codex-rs/core/tests` files that mention `SandboxPolicy`
from 18 to 17.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:09:48 -07:00
Michael Bolin
d77d23da2e core tests: migrate model/personality turns to profiles (#20018)
## Summary

- Migrates `model_switching.rs` and `personality.rs` direct
`Op::UserTurn` construction from legacy `SandboxPolicy` literals to
`PermissionProfile`-backed turn fields.
- Adds small local helpers in each file so tests keep asserting
model/personality behavior without repeating permission plumbing.
- Reduces `rg -l '\bSandboxPolicy\b' codex-rs/core/tests` from 20 files
to 18; `codex-rs/tui` remains at zero `SandboxPolicy` references.

## Testing

- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:09:12 -07:00
Abhinav
5b0d9df1d0 Increase plugin hook env test timeout (#20100)
# Why

`plugin_hook_sources_run_with_plugin_env_and_plugin_source` can still
fail on Windows after the earlier file-based assertion cleanup because
the hook process itself occasionally exceeds the old 5s timeout under CI
load. When that happens, the hook run ends as `Failed` before the test
can inspect its structured output.

The Windows Bazel failure showed the hook run itself failing after
nearly 8 seconds:

```text
---- engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source stdout ----
thread 'engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source' panicked at hooks/src\engine\mod_tests.rs:428:5:
assertion failed: `(left == right)`
Diff < left / right > :
<Failed
>Completed
...
test result: FAILED. 78 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.96s
```

# What

- raise the flaky plugin hook env test timeout from 5s to 10s so it
matches the other executed hook tests in this module

# Validation

- `cargo test -p codex-hooks`
2026-04-28 17:08:12 -07:00
Michael Bolin
d6d79ffcc7 core tests: send model turns with permission profiles (#20016)
## Summary
- Migrate direct `Op::UserTurn` construction in remote-model tests from
legacy `SandboxPolicy::DangerFullAccess` to
`PermissionProfile::Disabled` via `turn_permission_fields()`.
- Migrate the Responses API proxy header helper from an inline
workspace-write `SandboxPolicy` to
`PermissionProfile::workspace_write()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 22
files after #20015 to 20 files.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`





























---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20016).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* __->__ #20016
2026-04-28 17:08:04 -07:00
Michael Bolin
158b2a4201 core tests: configure profiles directly (#20015)
## Summary
- Replace legacy sandbox config setup in delegate and telemetry tests
with direct `PermissionProfile` configuration.
- Move no-sandbox and read-only test turns in `tools.rs`,
`code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
leaving the deny-glob read-only compatibility case for a later targeted
cleanup.
- Use `PermissionProfile::read_only()` where tests need managed
read-only behavior and `PermissionProfile::Disabled` where they
intentionally need no sandbox.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
files after #20013 to 22 files.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:06:59 -07:00
Michael Bolin
52e79ee49a core tests: migrate more turns to permission profiles (#20013)
## Summary
- Migrate another batch of direct `Op::UserTurn` test construction from
legacy `SandboxPolicy` values to `PermissionProfile` inputs via
`turn_permission_fields()`.
- Replace a one-off read-only `SandboxPolicy` bridge in the macOS exec
test with `PermissionProfile::read_only()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 32
files at the start of the cleanup stack to 27 files.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p codex-core`
2026-04-28 17:05:53 -07:00
Michael Bolin
7d15936e69 core tests: build user turns from permission profiles (#20011)
## Summary
- Add `turn_permission_fields()` so tests that construct `Op::UserTurn`
directly can provide a canonical `PermissionProfile` while still filling
the required legacy `sandbox_policy` compatibility field.
- Migrate direct user-turn construction in core integration tests from
`SandboxPolicy::DangerFullAccess` to `PermissionProfile::Disabled`.
- Continue reducing direct `SandboxPolicy` usage in
`codex-rs/core/tests`, from 41 files after #20010 to 32 files in this
PR.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
2026-04-28 17:03:20 -07:00
Eric Traut
2223b31c06 Refine Codex issue digest summaries (#20097)
## Why

The `codex-issue-digest` skill was producing more detail than the daily
digest needed, and broad all-area digests could miss active issues. In
particular, issue #16088 had substantial recent comments and reactions
but did not appear in the weekly all-areas output because GitHub search
was using default relevance ranking and the collector could exhaust its
candidate cap before later search queries got a fair sample.

That made the digest look quieter than the underlying user activity and
made threshold tuning misleading.

## What changed

- Make the digest summary headline-first and summary-only by default.
- Add an explicit opt-in flow for `## Details`, so the issue table is
shown only when requested or when the prompt asks for details upfront.
- Update the collector to request GitHub issue search results with
`sort=updated` and `order=desc`.
- Apply the search candidate cap per query instead of globally across
all queries.
- Bump the collector script version to `3`.
- Add tests that cover updated sorting and per-query candidate limits.

## Verification

- `pytest
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `ruff check
.codex/skills/codex-issue-digest/scripts/collect_issue_digest.py
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `git diff --check`
- Reran the all-areas weekly collector and confirmed #16088 is now
included with `55` interactions.
2026-04-28 16:53:59 -07:00
Ruslan Nigmatullin
c6465c1ec2 app-server: notify clients of remote-control status changes (#19919)
## Why

Remote-control app-server enrollments have both an internal server id
and the environment id exposed to remote-control clients. App-server
clients need one current status snapshot that says whether remote
control is usable and which environment id, if any, is exposed.

A temporary websocket disconnect is not itself an identity change.
Account changes, stale enrollment invalidation, successful
re-enrollment, and missing ChatGPT auth are meaningful status changes.
Disabled remote control remains `disabled` regardless of auth or SQLite
state. SQLite startup failure disablement and enrollment persistence
failures are handled in #20068; this PR reports the resulting effective
status to clients.

## What changed

- Adds v2 `remoteControl/status/changed` carrying `state` and
`environmentId`.
- Adds `RemoteControlConnectionState` values: `disabled`, `connecting`,
`connected`, and `errored`.
- Exposes remote-control status updates through `RemoteControlHandle`
using a Tokio watch channel.
- Always sends the current remote-control status snapshot to newly
initialized app-server clients.
- Broadcasts status changes to initialized app-server clients when state
or environment id changes.
- Treats missing ChatGPT auth as an `errored` status while leaving it
retryable because auth can change at runtime.
- Clears `environmentId` when enrollment is cleared for account changes,
auth loss, stale backend invalidation, or disabled remote control.
- Updates app-server protocol schema fixtures, generated TypeScript,
app-server README, remote-control tests, and TUI exhaustive notification
matches.

## Stack

- Builds on #20068.

## Verification

- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server transport::remote_control --lib`
- `cargo check -p codex-tui`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
2026-04-28 23:52:14 +00:00
Gabriel Peal
5e6cbbadf7 Return None when auth refresh fails (#20092)
Right now, if Codex winds up in a state with auth but it can't refresh
the token, the user is left with an unhelpful message that says to log
out and log back in again.

Ultimately, we should prevent that from happening but if it does,
returning None will allow the caller to redirect the user back to the
login page
2026-04-28 16:15:47 -07:00
Michael Bolin
891722849d core tests: submit turns with permission profiles (#20010)
## Summary

- Add `PermissionProfile`-based turn submission helpers to
`core_test_support`, while keeping the legacy `SandboxPolicy` helper for
tests that intentionally exercise legacy fallback behavior.
- Switch the default `TestCodex::submit_turn()` path to send a real
`PermissionProfile` plus the required legacy compatibility projection in
`Op::UserTurn`.
- Migrate straightforward app/search/shell/truncation tests from
`SandboxPolicy::{DangerFullAccess, ReadOnly}` to
`PermissionProfile::{Disabled, read_only}`.
- Add a TUI compatibility projection helper for legacy app-server fields
so non-legacy writable roots are preserved instead of being downgraded
to read-only.
- Fix remote start/resume/fork sandbox-mode projection to classify any
managed profile with writable roots as workspace-write, not only
profiles that can write `cwd`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 47
files to 41 files without changing production behavior.

## Testing

- `cargo check -p codex-core --tests`
- `cargo test -p codex-tui
compatibility_profile_preserves_unbridgeable_write_roots`
- `cargo test -p codex-tui
sandbox_mode_preserves_non_cwd_write_roots_for_remote_sessions`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
2026-04-28 23:01:40 +00:00
viyatb-oai
2dbde94aa9 fix(network-proxy): normalize network proxy host matching (#19995)
## Why
The proxy matches allow and deny rules against normalized host strings.
Scoped IPv6 literals can arrive in equivalent forms, such as
`fd00::1%eth0`, `[fd00::1%eth0]`, or `[fd00::1%25eth0]`. Policy should
canonicalize those spellings without erasing scope granularity: an
unscoped rule like `fd00::1` should still cover scoped requests for that
address, while a scoped rule like `fd00::1%eth0` should remain exact to
that scope.

## What changed
- preserve IPv6 scope IDs during host normalization and canonicalize
`%25scope` to `%scope`
- match policy against the exact normalized host plus the unscoped IP
base for scoped literals
- keep local-address explicit allow checks aligned with the same
scoped/unscoped semantics
- add focused coverage for scoped IPv6 normalization, scoped allow
rules, and scoped deny rules in `network-proxy`

## Security impact
A request cannot bypass a broad deny rule by adding an IPv6 scope
suffix. At the same time, scoped policy remains precise:
`deny=fd00::1%eth0` affects that scoped spelling without collapsing
`fd00::1%eth1` onto the same key, and `allow=fe80::1%eth0` does not
implicitly allow other scopes.

## Verification
- `just fmt`
- `cargo test -p codex-network-proxy`
- `just fix -p codex-network-proxy`
- `git diff --check`

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: evawong-oai <evawong@openai.com>
2026-04-28 15:50:00 -07:00
Abhinav
3291463ff1 Fix flaky plugin hook env test (#20088)
The test was flaky because it was checking the right thing in a
roundabout way.

What it wanted to prove:
- plugin hooks receive the right environment variables.

What it actually did:
1. Run a plugin hook.
2. Have that hook write those env vars into a temporary `env.json` file.
3. After the hook finished, read `env.json` back from disk.

On Windows, that last file was sometimes not there when the test tried
to read it, so the test failed with `read env log: file not found`. The
hook system itself was not what the test failure was directly proving;
the test was failing on the extra filesystem side effect it introduced.

The fix is to stop using a temp file as the proof mechanism. The hook
now prints the env values in its normal structured output, and the test
asserts on the output that the hook engine already captures. So we still
verify the same behavior, but without depending on a separate file being
created and read back correctly on Windows.
2026-04-28 15:45:26 -07:00
Owen Lin
2e598df6fc fix: don't auto approve git -C ... (#20085)
It's safer to make sure these commands go through approval flows.
2026-04-28 22:06:55 +00:00
canvrno-oai
66b0781502 /plugins: add marketplace install flow (#18704)
This PR adds a new feature to the `/plugins` menu that gives users the
ability to add new plugin marketplaces. It introduces an Add Marketplace
tab to the right of installed marketplaces, a source prompt, loading and
error states, and the app-server request flow needed to perform the
install. After a successful `marketplace/add`, the popup refreshes back
into the newly added marketplace tab so the new plugins are immediately
visible.

- Add an Add Marketplace tab to the `/plugins` menu
- Prompt for marketplace source input from git repo, URL, or local path
- Show loading and error states during `marketplace/add`
- Refresh plugin data after success and switch into the newly added
marketplace tab
- Add tests and snapshot updates
2026-04-28 14:22:39 -07:00
Abhinav
c6e7d564c3 Discover hooks bundled with plugins (#19705)
## Why

Plugins can bundle lifecycle hooks, but Codex previously only discovered
hooks from user, project, and managed config layers. This adds the
plugin discovery and runtime plumbing needed for plugin-bundled hooks
while keeping execution behind the `plugin_hooks` feature flag.

## What

- Discovers plugin hook sources from each plugin's default
`hooks/hooks.json`.
- Supports `plugin.json` manifest `hooks` entries as either relative
paths or inline hook objects.
- Plumbs discovered plugin hook sources through plugin loading into the
hook runtime when `plugin_hooks` is enabled.
- Marks plugin-originated hook runs as `HookSource::Plugin`.
- Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
command environments.
- Updates generated schemas and hook source metadata for the plugin hook
source.

## Stack

1. This PR - openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882

## Reviewer Notes

- Core logic is in `codex-rs/core-plugins/src/loader.rs` and
`codex-rs/hooks/src/engine/discovery.rs`
- Moved existing / adding new tests to
`codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
- Otherwise mostly plumbing and minor schema updates

### Core Changes

The `codex-rs/core` changes are limited to wiring plugin hook support
into existing core flows:

- `core/src/session/session.rs` conditionally pulls effective plugin
hook sources and plugin hook load warnings from `PluginsManager` when
`plugin_hooks` is enabled, then passes them into `HooksConfig`.
- `core/src/hook_runtime.rs` adds the `plugin` metric tag for
`HookSource::Plugin`.
- `core/config.schema.json` picks up the new `plugin_hooks` feature
flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
added plugin hook fields.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 14:17:18 -07:00
cassirer-openai
89698ad1c3 [rollout-trace] Include x-request-id in rollout trace. (#20066)
## Why

Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.

This PR separates those concepts so trace consumers can reliably answer
both questions:

- which Responses API response did this inference produce?
- which upstream request handled it?

## Structure

The change keeps the upstream request id at the same lifecycle level as
the provider stream:

- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.

## Why This Shape

`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.

Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.

## Validation

The PR updates trace and reducer coverage for:

- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.
2026-04-28 21:11:17 +00:00
Ruslan Nigmatullin
10e2a73b3c app-server: disable remote control without sqlite (#20068)
## Why

Remote control depends on the app-server SQLite state DB for persisted
enrollment identity. If the state DB cannot be opened at startup,
continuing with remote control enabled leaves the process in a
misleading state where enrollment identity cannot be read or persisted.

Feature-disabled remote control remains disabled regardless of SQLite
state. This only changes the case where remote control is requested but
the SQLite state DB is unavailable.

## What changed

- Logs SQLite state DB initialization failures instead of dropping the
error silently.
- Treats remote control as effectively disabled when the SQLite state DB
is unavailable.
- Prevents `RemoteControlHandle::set_enabled(true)` from enabling remote
control later in the same process if the state DB was unavailable at
startup.
- Keeps the existing behavior that disabled remote control does not
validate or connect to the remote-control URL.
- Makes persisted enrollment load/update failures propagate as
remote-control errors instead of silently falling back to in-memory
state.
- Makes the direct websocket connection path fail when called without a
SQLite state DB.
- Adds coverage for startup without a state DB, later handle enablement
with no state DB, and direct websocket connection without a state DB.

## Verification

- `cargo test -p codex-app-server transport::remote_control --lib`
- `just fix -p codex-app-server`
2026-04-28 13:49:00 -07:00
Michael Bolin
3b74a4d3b1 tui: use permission profiles for sandbox state (#20008)
## Summary
- Move TUI permission state from legacy `SandboxPolicy` values to
canonical `PermissionProfile` values across presets, app events, chat
widget state, app commands, thread routing, and cached thread session
state.
- Keep app-server compatibility boundaries explicit: embedded sessions
send `permissionProfile`, while remote sessions send only a legacy
`sandbox` projection and fall back to read-only when a custom profile
cannot be projected.
- Update status/add-dir UI summaries and snapshots to render the active
permission profile, including workspace profiles selected by the new
built-in defaults.

## Verification
- `rg '\bSandboxPolicy\b' codex-rs/tui -n` returns no matches.
- `cargo test -p codex-tui`
- `cargo check -p codex-tui --tests`
- `cargo test -p codex-tui additional_dirs`
- `just fmt`
- `just fix -p codex-tui`




































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20008).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* __->__ #20008
2026-04-28 20:36:48 +00:00
jif-oai
34d71d43eb Make MultiAgentV2 wait minimum configurable (#20052)
## Why

MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.

This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.

## What Changed

- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.

## Verification

- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
2026-04-28 22:36:44 +02:00
Ruslan Nigmatullin
1de7a9bf69 app-server: allow remote_control runtime feature override (#20047) 2026-04-28 13:36:12 -07:00
viyatb-oai
e1ba87ccb2 fix(network-proxy): recheck network proxy connect targets (#19999)
## Why
The proxy checks the requested host before opening the upstream
connection, but DNS can resolve an allowed hostname to a loopback,
private, or other non-public address after that first decision. Without
a final check on the actual socket target, a request that looks
acceptable at the hostname layer can still connect to a local service
once resolution completes.

## What changed
- add a shared TCP connector check for direct proxy egress
- use that path for HTTP, `CONNECT`, SOCKS5, and MITM upstream
connections
- keep configured upstream proxy hops on the existing proxy path
- add direct-connector coverage for allowed and rejected local targets

## Security impact
Direct proxy egress now rechecks the resolved socket address before
connecting, closing the gap between hostname policy evaluation and the
final network target.

## Verification
- `cargo test -p codex-network-proxy`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 12:51:43 -07:00
Shijie Rao
25ac0e4527 Load cloud requirements for agent identity (#19708)
## Why

Agent Identity sessions can represent Business and Enterprise ChatGPT
workspaces, but cloud requirements were skipped before fetch. That meant
workspace-managed requirements were not loaded for Agent Identity even
when the JWT carried the same account identity and plan information that
normal ChatGPT token auth exposes.

This PR now sits on top of the Agent Identity stack through
[#19764](https://github.com/openai/codex/pull/19764). Because
[#19763](https://github.com/openai/codex/pull/19763) moved task
registration into Agent Identity auth loading, cloud requirements no
longer needs a separate runtime-initialization step before building the
backend client.

## What changed

- Stop skipping `CodexAuth::AgentIdentity` in the cloud requirements
loader.
- Share the cloud requirements eligibility check between startup load
and background cache refresh.
- Rely on eagerly loaded Agent Identity auth so backend requests can
attach task-scoped `AgentAssertion` headers.
- Decode Agent Identity JWT `plan_type` as the auth-layer plan type,
then convert it through a shared `auth::PlanType` -> `account::PlanType`
mapping.
- Add the missing serde alias for the `education` plan string and add
coverage for raw Agent Identity plan aliases such as `hc` and
`education`.

## Testing

- `cargo test -p codex-agent-identity -p codex-login -p
codex-cloud-requirements -p codex-protocol`
2026-04-28 12:35:00 -07:00
Ruslan Nigmatullin
0700f979ba app-server: run initialized rpcs with keyed serialization (#17373)
## Why

Initialized app-server RPCs no longer need to bottleneck behind one
request processor path. Running them concurrently improves
responsiveness, but several request families still mutate shared state
or depend on ordered side effects. Those stateful families need an
auditable serialization contract so concurrency does not reorder thread,
config, auth, command, watcher, MCP, or similar state transitions.

This PR keeps that boundary explicit: stateful work is serialized by the
smallest useful key, while intentionally read-only or externally
concurrent work remains unkeyed. In particular, `thread/list` and
`thread/turns/list` explicitly have no serialization because they
primarily read append-only rollout storage and should continue to be
served concurrently.

## What changed

- Adds `ClientRequest::serialization_scope()` in `app-server-protocol`
and requires every client request definition to declare its
serialization behavior.
- Introduces keyed request scopes for thread, thread path, command exec
process, fuzzy search session, fs watch, MCP OAuth, and global state
buckets such as config, account auth, memory, and device keys.
- Routes initialized app-server RPCs through per-key FIFO serialization
while allowing unkeyed initialized requests to run concurrently.
- Cancels in-flight initialized RPC work when the connection disconnects
or the app-server exits so spawned request tasks do not outlive their
session.
- Adds focused coverage for representative keyed and unkeyed
serialization scopes, including explicitly concurrent
`thread/turns/list` behavior.

## Validation

- Added protocol tests for representative keyed serialization scopes and
intentionally unkeyed request families.
- Added app-server request serialization tests covering per-key FIFO
behavior, concurrent unkeyed execution, disconnect shutdown, and config
read-after-write ordering.
- Local focused protocol validation after the latest rebase is currently
blocked by packageproxy failing to resolve locked `rustls-webpki
0.103.13`; CI is expected to provide the full validation signal.
2026-04-28 12:23:34 -07:00
Dylan Hurd
7f7c7c2c07 Fix log db batch flush flake (#19959)
## Why

The log DB writer batches tracing events before inserting them into
SQLite, but `tokio::time::interval` produces an immediate first tick.
That meant the inserter could flush the first accepted log entry before
`batch_size` was reached, making
`configured_batch_size_flushes_without_explicit_flush` timing-sensitive
in CI.

## What Changed

- Consume the interval's startup tick before entering the inserter loop,
so interval flushing starts after the configured delay.
- Remove the test's startup sleep, which was masking the race instead of
proving the batch-size behavior.

## Validation

- `cargo test -p codex-state`
- `cargo test -p codex-state
configured_batch_size_flushes_without_explicit_flush` passed 3
consecutive focused runs
- PR checks passed across `rust-ci`, Bazel, `ci`, `sdk`, `cargo-deny`,
Codespell, blob-size policy, and CLA
2026-04-28 12:08:41 -07:00
viyatb-oai
3377afd84a fix(network-proxy): harden linux proxy bridge helpers (#20001)
## Why
The Linux managed-proxy bridge helpers are long-lived child processes in
the sandbox networking path. Before this change they stayed dumpable and
the network seccomp profile did not block cross-process memory syscalls,
so another same-user process could potentially inspect or modify bridge
memory instead of interacting only through the intended proxy interface.

## What changed
- reuse the shared `codex-process-hardening` helper to mark bridge
helper children non-dumpable before they begin serving
- deny `process_vm_readv` and `process_vm_writev` in the existing
network seccomp filter

## Security impact
Bridge helpers are less exposed to same-user cross-process inspection or
memory writes, which reduces the chance that sandboxed code can
interfere with proxy support processes outside the intended IPC path.

## Verification
- `cargo test -p codex-process-hardening`
- `cargo test -p codex-linux-sandbox`
- attempted `cargo check -p codex-linux-sandbox --target
x86_64-unknown-linux-gnu`; blocked on missing `x86_64-linux-gnu-gcc` on
this macOS host

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 11:52:50 -07:00
charley-openai
de2ccf9473 [codex] Add token usage to turn tracing spans (#19432)
## Why

Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.

[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>

## What Changed

Added response-level token fields on completed handle_responses spans:

gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:

codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.

## Testing

- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields

Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
2026-04-28 11:41:32 -07:00
canvrno-oai
640a1b23ea Fix plan mode nudge test after task completion signature change (#20045)
Updates the plan mode nudge test to pass the new `duration_ms` argument
to task completion.

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 11:24:22 -07:00
Michael Bolin
9e26613657 permissions: add built-in default profiles (#19900)
## Why

The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.

This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.

## What changed

- Adds built-in `default_permissions` profile names:
  - `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.

## Verification

- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`

## Documentation

Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
2026-04-28 11:21:39 -07:00
viyatb-oai
3afb185a4f fix(network-proxy): tighten network proxy bypass defaults (#20002)
## Why
Managed sessions use `NO_PROXY` to keep a small set of destinations on
the direct path by default. The old default also bypassed all IPv4
link-local addresses in `169.254.0.0/16`, which includes metadata
endpoints such as `169.254.169.254`. Because `NO_PROXY` is evaluated by
the client before the request reaches the managed proxy, requests to
that range could skip proxy-side allowlist and local-binding checks
entirely. On hosts where a link-local metadata service is reachable,
that creates a path to sensitive environment metadata or credentials
outside the intended enforcement point.

## What changed
- remove the default IPv4 link-local `169.254.0.0/16` bypass from the
managed proxy environment
- keep the existing loopback and private-network defaults unchanged
- update the regression assertion to lock in the narrower default

## Security impact
Link-local requests now stay on the managed-proxy path by default, so
the proxy can apply configured policy before they reach metadata-style
endpoints or other link-local services.

## Verification
- `cargo test -p codex-network-proxy`

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 10:51:43 -07:00
stefanstokic-oai
4c68bd728f External agent session support (#19895)
## Summary

This extends external agent detection/import beyond config artifacts so
Codex can detect recent sessions files from the external agent home and
import them into Codex rollout history.

## What changed

- Added a focused `external_agent_sessions` module for:
  - session discovery
  - source-record parsing
  - rollout construction
  - import ledger tracking
- Wired session detection/import into the app-server external agent
config API.
- Added compaction handling so large imported sessions can be resumed
safely before the first follow-up turn.

## Testing

Added coverage for:
- recent-session detection
- custom-title handling
- recency filtering
- dedupe and re-detect-after-source-change behavior
- visible imported turn construction
- backward-compatible import payload deserialization
- end-to-end RPC import flow
- rejection of undetected session paths
- repeat-import behavior
- large-session compaction before first follow-up

Ran:
- `cargo test -p codex-app-server external_agent_config_import_ --test
all`
2026-04-28 17:42:36 +00:00
Felipe Coury
a036584104 fix(tui): let esc exit empty shell mode (#19986)
## Summary

- exit shell mode when `Esc` is pressed while the absorbed `!` is the
only input
- add direct regression coverage plus a composer snapshot for the
restored normal prompt state

## Root cause

Shell mode stores the leading `!` outside the editable textarea. After
typing only `!`, the textarea is empty but the composer is still in bash
mode, so the existing empty-composer `Esc` handling never runs.

## Validation

- `just fmt`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::esc_exits_empty_shell_mode`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::footer_mode_snapshots`
- `cargo insta pending-snapshots`

`cargo test -p codex-tui` still reports unrelated existing `/status`
snapshot drift in this local environment because the rendered
permissions text is `workspace-write with network access` instead of the
older `read-only` fixture text.
2026-04-28 14:35:24 -03:00
canvrno-oai
bc5a1b961e Move local /resume cwd filtering into thread/list (#19931)
Move local resume and fork cwd filtering to `thread/list` instead of
filtering in the TUI. This makes the `/resume` menu feel slightly faster
to load when working in repos with many historical threads, and
centralizes the cwd filtering in app-server.

**Affected:**
- /resume from inside the TUI.
- codex resume with no session ID and without --last
- codex resume --all
- codex fork with no session ID and without --last
- codex fork --all

**Not affected:**
- codex resume <id>
- codex fork <id>
- codex resume --last
- codex fork --last

Steps to test performance improvement in a real Codex environment:
- Launch `codex resume` using compiled binary in a directory that has
seen many threads.
- Launch `codex resume` using release binary in same directory.
- Observe difference in time-to-full-page as threads load.
2026-04-28 10:35:10 -07:00
Felipe Coury
c6bcd27832 feat(tui): suggest plan mode from composer drafts (#19901)
## Summary

- suggest Plan mode when the current composer draft contains the
standalone word `plan`
- shares the Codex App heuristics for detection
- excludes things line `/plan` and the word plan in shell mode
- reuse the existing `Shift+Tab` mode cycle and add thread-scoped
dismissal with `Esc`
- replace the normal footer hint while the reminder is visible so the
statusline stays anchored


https://github.com/user-attachments/assets/01123ae8-cee6-4e95-b563-44655c071cde

## Why

The desktop app already nudges users toward Plan mode when their draft
clearly signals planning intent. The TUI had the underlying `/plan` and
`Shift+Tab` flows, but no equivalent reminder at the moment the user was
most likely to benefit from them.

## Details

The reminder is shown only when Plan mode is available, the draft
contains standalone `plan`, the user is not already in Plan mode, the
composer is actionable, and the current thread has not dismissed the
reminder. Slash-command and shell-command drafts are excluded.

The first implementation used an extra composer row, but that moved the
statusline whenever the heuristic fired. This version keeps the layout
stable by rendering the reminder in the existing footer row instead.

## Validation

- `INSTA_UPDATE=always cargo test -p codex-tui
chatwidget::tests::plan_mode::plan_mode_nudge -- --nocapture`
- `just fmt`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui`
- `cargo insta pending-snapshots`
- `git diff --check`
2026-04-28 14:34:10 -03:00
maja-openai
273c2e21a9 Clarify network approval auto-review prompts (#19907)
## Why

Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.

## What changed

- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.

## Verification

- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
2026-04-28 10:25:37 -07:00
mchen-oai
01de13b7e6 Record MCP result telemetry on mcp.tools.call spans (#19509)
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.

## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.


## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`
2026-04-28 17:20:38 +00:00
evawong-oai
0670d8971a Enforce workspace metadata protections in Seatbelt (#19847)
## Summary

Translate FileSystemSandboxPolicy project root metadata carveouts into
macOS Seatbelt rules.

## Scope

1. Thread protected metadata names into Seatbelt access roots.
2. Ask FileSystemSandboxPolicy whether each metadata carveout is
writable.
3. Emit Seatbelt deny rules that block creating or replacing protected
metadata names under writable roots.
4. Add coverage for first time metadata creation and read only
carveouts.

## Reviewer Focus

1. This PR only covers the macOS sandbox adapter.
2. The policy decision comes from FileSystemSandboxPolicy.
3. Read only subpath carveouts and metadata protection checks should
compose cleanly.

## Stack

1. Policy primitive: #19846
2. macOS Seatbelt adapter: this PR
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852

## Validation

1. formatting for codex sandboxing
2. codex sandboxing package tests
2026-04-28 10:13:00 -07:00
efrazer-oai
f6797c3ac6 feat: verify agent identity JWTs with JWKS (#19764) 2026-04-28 09:56:20 -07:00
colby-oai
6138063656 Strip connector provenance metadata from custom MCP tools (#19875)
# Summary 
This prevents non-codex_apps MCP servers from spoofing connector
provenance metadata.
2026-04-28 12:43:26 -04:00
mchen-oai
ccec84b148 Add turn start timestamp to turn metadata (#19473)
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.

## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.

## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
2026-04-28 16:36:59 +00:00
Eric Traut
4e0cf945b7 Terminate stdio MCP servers on shutdown to avoid process leaks (#19753)
## Why

Several bug reports describe thread shutdown (including subagent
threads) leaving stdio MCP server processes behind. These reports all
point at the same lifecycle gap: Codex launches stdio MCP servers, but
the session-level shutdown path does not explicitly close MCP clients or
terminate the server process tree.

Fixes #12491
Fixes #12976
Fixes #18881
Fixes #19469

## History

This is best understood as a regression/coverage gap in MCP session
lifecycle management, not as stdio MCP cleanup being absent all along.
#10710 added process-group cleanup for stdio MCP servers, but that
cleanup only runs when the `RmcpClient`/transport is dropped. The older
reports (#12491 and #12976) came after that cleanup existed, which
suggests the remaining problem was that some higher-level shutdown paths
kept the MCP manager alive or replaced it without explicitly draining
clients. The newer reports (#18881 and #19469) exposed the same family
around manager replacement and shutdown.

## What changed

- Added an explicit stdio MCP process handle in `codex-rmcp-client` so
local MCP servers terminate their process group and executor-backed MCP
servers call the executor process terminator.
- Added `RmcpClient::shutdown()` and manager-level MCP shutdown draining
so session shutdown, channel-close fallback, MCP refresh, and connector
probing stop owned MCP clients.
- Added regression coverage that starts a stdio MCP server, begins an
in-flight blocking tool call, shuts down the client, and asserts the
server process exits.

## Verification

- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-mcp`
- `just fix -p codex-rmcp-client`
- `just fix -p codex-mcp`
- `just fix -p codex-core`

- Manual before/after validation with a temporary repro script:
- Pre-fix binary from `HEAD^` (`fed0a8f4fa`): reproduced the leak with
surviving MCP server and child PIDs, `survivors=[77583, 77592]`,
`leaked=true`.
- Post-fix binary from this branch (`67e318148b`): verified both MCP
processes were gone after interrupting `codex exec`, `survivors=[]`,
`leaked=false`.
2026-04-28 09:29:57 -07:00
Eric Traut
087c9c1f1f TUI: use cumulative turn duration for worked-for separator (#19929)
## Why

Fixes #19814.

The TUI's current `Worked for ...` timing behavior is a leftover from
#9599. At that point, models could emit multiple assistant messages in
one turn for preambles/commentary, but the TUI did not yet have a
reliable signal that an assistant message was the final answer when it
started streaming. To avoid showing an ever-growing elapsed time on each
preamble separator, #9599 made the separator timer incremental by
tracking elapsed time since the previous separator.

That workaround is no longer the right model for the final
completed-turn display. Since then, #16638 added protocol-native turn
timing, including `duration_ms` on turn completion. With that cumulative
duration available at the point where the TUI renders the completed-turn
separator, the UI can show the actual turn duration directly instead of
carrying per-separator timing state.

## What Changed

- Thread `duration_ms` into `ChatWidget::on_task_complete` from both
legacy `TurnCompleteEvent` handling and app-server `TurnCompleted`
notifications.
- Use `duration_ms` for the final `Worked for ...` separator, falling
back to the status indicator timer only when the protocol duration is
unavailable.
- Keep mid-turn separators before later assistant text as plain visual
dividers instead of clocked `Worked for ...` separators.
- Remove the old incremental separator timer state and helper
(`last_separator_elapsed_secs` / `worked_elapsed_from`).
- Add a snapshot regression test for a turn that runs a command and then
completes with a final answer, verifying the final separator uses the
cumulative turn duration.

## Verification

- `cargo test -p codex-tui
final_worked_for_uses_cumulative_turn_duration_snapshot`
- `just fix -p codex-tui`

Manual repro prompt:

```text
Manual timing repro. First send a short preamble/commentary sentence before using tools. Then run exactly this shell command: sleep 75; echo MANUAL_TIMING_DONE. After the command finishes, give a final answer that says "done". Do not skip the preamble.
```

After this change, the mid-turn break before the final answer should be
a plain divider, and the final completed-turn separator should show
`Worked for ...` using the cumulative turn duration.

Before:
<img width="414" height="102" alt="Screenshot 2026-04-27 at 10 09 01 PM"
src="https://github.com/user-attachments/assets/b9e2ce01-2460-40e4-a5c4-c9ba8add2557"
/>


After:
<img width="485" height="149" alt="Screenshot 2026-04-27 at 10 09 07 PM"
src="https://github.com/user-attachments/assets/d24089ae-d4e2-41b6-b966-07c98706ead4"
/>
2026-04-28 09:24:29 -07:00
jif-oai
5b7d6f5c4f feat: house-keeping memories 3 (#20005)
Move stuff in memories, no behavioural change expected
2026-04-28 18:13:35 +02:00
evawong-oai
0156b1e61f [sandbox] Enforce protected workspace metadata paths (#19846)
## Summary

Make FileSystemSandboxPolicy the semantic source of truth for project
root metadata protection. Under writable roots, `.git`, `.codex`, and
`.agents` stay protected unless user policy grants an explicit write
rule for that metadata path.

## Scope

1. Add `protected_metadata_names` to `WritableRoot`.
2. Teach `FileSystemSandboxPolicy::can_write_path_with_cwd` to reject
protected metadata writes under writable roots unless explicitly
allowed.
3. Default workspace write profiles to protect `.git`, `.codex`, and
`.agents`.
4. Add the Linux fallback setup needed before Linux enforcement lands
later in the stack.

## Reviewer Focus

1. The policy decision belongs in FileSystemSandboxPolicy, not shell
command parsing.
2. Legacy SandboxPolicy remains a compatibility projection, not the
source of the new rule.
3. Explicit user write rules can still opt into these metadata paths.

## Stack

1. Policy primitive: this PR
2. macOS Seatbelt adapter: #19847
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852

## Validation

1. codex protocol permissions tests
2. formatting for codex protocol and codex linux sandbox
3. diff whitespace check
2026-04-28 09:10:41 -07:00
Felipe Coury
5e737372ee feat(tui): add configurable keymap support (#18593)
## Why

The TUI currently handles keyboard shortcuts as hard-coded event matches
spread across app, composer, pager, list, approval, and navigation code.
That makes shortcuts hard to customize, makes displayed hints easy to
drift from actual behavior, and makes future keymap work riskier because
there is no central action inventory.

This PR adds the foundation for configurable, action-based keymaps
without adding the interactive remapping UI yet. Onboarding
intentionally stays on fixed startup shortcuts because users cannot
reasonably configure keymaps before completing onboarding.

This is PR1 in the keymap stack:

- PR1: #18593: configurable keymap foundation
- PR2: #18594: `/keymap` picker and guided remapping UI
- PR3: #18595: Vim composer mode and the remap option

## Design Notes

The new model resolves named actions into concrete runtime bindings once
from config, then passes those bindings to the UI surfaces that handle
input or render shortcut hints.

The main concepts are:

- **Context**: a scope where an action is active, such as `global`,
`chat`, `composer`, `editor`, `pager`, `list`, or `approval`.
- **Action**: a named operation inside a context, such as
`global.open_transcript`, `composer.submit`, or `pager.close`.
- **Binding**: one or more single-key shortcuts assigned to an action,
written as config strings such as `ctrl-t`, `alt-backspace`, or
`page-down`. Multi-step sequences such as `ctrl-x ctrl-s`, `g g`, or
leader-key flows are not part of this PR.
- **Resolution order**: context-specific config wins first, supported
global fallbacks come next, and built-in defaults fill in anything
unset.
- **Explicit unbinding**: an empty array removes an action binding in
that scope and does not fall through to a fallback binding.
- **Conflict validation**: a resolved keymap rejects duplicate active
bindings inside the same scope so one keypress cannot dispatch two
actions.

## What Changed

- Added `TuiKeymap` config support under `[tui.keymap]`, including typed
contexts/actions, key alias normalization, generated schema coverage,
and user-facing config errors.
- Added `RuntimeKeymap` resolution in `codex-rs/tui/src/keymap.rs`,
including fallback precedence, built-in defaults, explicit unbinding,
and per-context conflict validation.
- Rewired existing TUI handlers to consume resolved keymap actions
instead of directly matching hard-coded keys in each component.
- Updated key hint rendering and footer/pager/list surfaces so displayed
shortcuts follow the resolved keymap.
- Kept onboarding shortcuts fixed in
`codex-rs/tui/src/onboarding/keys.rs` instead of exposing them through
`[tui.keymap]`.

## Validation

The branch includes focused coverage for config parsing, key
normalization, runtime fallback resolution, explicit unbinding,
duplicate-key conflict validation, default keymap consistency,
onboarding startup key behavior, and UI hint snapshots affected by
resolved key bindings.
2026-04-28 12:52:25 -03:00
Eric Traut
a61c785040 Reset TUI keyboard reporting on exit (#19625)
## Why

Codex enables enhanced keyboard reporting while the TUI owns the
terminal. In iTerm2, exiting the TUI with Ctrl+C can intermittently
leave the parent shell receiving raw CSI-u / `modifyOtherKeys` fragments
instead of normal key input.

Final terminal cleanup should put the parent shell back into normal
keyboard reporting even if the terminal misses the usual stack pop.

Fixes #19553.

## What Changed

- Move TUI keyboard enhancement setup and detection into
`tui/src/tui/keyboard_modes.rs`.
- Add an exit-only `restore_after_exit()` path that performs the normal
keyboard enhancement pop plus unconditional keyboard enhancement and
`modifyOtherKeys` resets.
- Keep temporary restore paths, such as external-editor handoff, using
the balanced stack pop behavior.

## Confidence

Medium. This is a speculative fix: I was not able to reproduce the
reported iTerm2 behavior manually, but the symptoms line up with
terminal keyboard reporting state surviving Codex exit. The added reset
sequences are scoped to final TUI shutdown and should be harmless when
the terminal is already clean.
2026-04-28 08:51:44 -07:00
friel-openai
598bbcdb58 Preserve assistant phase for replayed messages (#19832) 2026-04-28 08:46:13 -07:00
jif-oai
21e19912e0 feat: house-keeping memories 2 (#20000)
Just move metrics in a dedicated file
2026-04-28 17:26:44 +02:00
jif-oai
5a79dfab7c feat: house-keeping memories 1 (#19998)
Just move metrics in a dedicated file
2026-04-28 17:11:49 +02:00
jif-oai
1b74360365 feat: skip memory startup when Codex rate limits are low (#19990)
## Why

Memory startup runs in the background after an eligible turn, but it can
consume Codex backend quota at exactly the wrong time: when the user is
already near a rate-limit boundary. This PR adds a guard so the memory
pipeline backs off when the Codex rate-limit snapshot says the remaining
budget is too low.

## What Changed

- Added `memories.min_rate_limit_remaining_percent` with a default of
`25`, clamped to `0..=100`, and regenerated `core/config.schema.json`.
- Added `codex-rs/memories/write/src/guard.rs`, which fetches Codex
backend rate limits before memory startup and skips phase 1 / phase 2
when the Codex limit is reached or either tracked window is above the
configured usage ceiling.
- Keeps startup best-effort: non-Codex auth or rate-limit fetch/client
failures preserve the existing memory startup behavior.
- Records a `codex.memory.startup` counter with
`status=skipped_rate_limit` when startup is skipped.
- Added config parsing/clamping coverage and guard unit tests.

## Verification

- Added `codex-rs/memories/write/src/guard_tests.rs` for threshold,
primary/secondary window, and reached-limit behavior.
- Added config tests for TOML parsing and clamping.
2026-04-28 17:07:16 +02:00
efrazer-oai
0e8d6b8765 fix: configure AgentIdentity AuthAPI base URL (#19904)
## Summary

AgentIdentity runtime loading currently registers tasks against a single
hardcoded AuthAPI base URL. That works for production, but local and
staging validation may need registration to target a different
authapi-login-provider without baking internal staging service URLs into
the OSS binary.

This PR adds a small config surface for
`agent_identity_authapi_base_url` and threads it through the existing
auth-loading path as a direct argument. Explicit config wins. Without
config, task registration keeps using the production AuthAPI URL,
matching the current default behavior.

## Stack

1. openai/codex#19762 - `refactor: make auth loading async` (merged)
2. openai/codex#19763 - `refactor: load agent identity runtime eagerly`
3. This PR - `fix: configure AgentIdentity AuthAPI base URL`
4. openai/codex#19764 - `feat: verify agent identity JWTs with JWKS`

## Design decisions

- Keep the existing auth-loading shape and pass the new value as an
argument. This avoids another wrapper loader and keeps the call path
readable.
- Add config instead of embedding internal staging URLs. Environments
that need a non-production AuthAPI can configure it explicitly.
- Keep the default AuthAPI registration URL as production.
`chatgpt_base_url` remains separate and is used by the follow-up JWKS
verification PR for fetching public keys from the ChatGPT backend route.
- Resolve the AuthAPI base URL inside AgentIdentity loading, because
task registration is the only consumer of this value.

## Testing

Tests: targeted Rust checks, AgentIdentity auth tests, config schema
regeneration, formatter/fix pass, and whitespace diff check.
2026-04-28 08:06:45 -07:00
jif-oai
a9e5c34083 feat: trigger memories from user turns with cooldown (#19970)
## Why

Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.

The idea is to prevent ghost cost due to pre-warm triggered by the app

Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.

## What Changed

- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)

## Verification

Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 16:23:13 +02:00
jif-oai
fa127be25f Stabilize memory Phase 2 input ordering (#19967)
## Why

Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption

## What Changed

- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.

## Test Coverage

- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 13:32:05 +02:00
jif-oai
54d1401170 feat: fix hinting 3 (#19963)
Fix https://github.com/openai/codex/pull/19805#discussion_r3153265562
2026-04-28 13:12:51 +02:00
jif-oai
b7c0f26910 feat: fix hinting 2 (#19961)
Fix this:
https://github.com/openai/codex/pull/19805#discussion_r3153265562
2026-04-28 13:06:41 +02:00
jif-oai
431ebeaef7 feat: split memories part 2 (#19860)
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 13:03:28 +02:00
jif-oai
fd36838cf3 Add MultiAgentV2 root and subagent context hints (#19805)
## Why

MultiAgentV2 sessions need startup guidance that matches the role of the
thread that is actually being created. Root agents and subagents have
different responsibilities, and forked subagents can inherit parent
rollout history. If the parent hint is carried into the child context,
the child can see stale or conflicting developer guidance before its own
session-specific context is added.

## What changed

- Added `features.multi_agent_v2.root_agent_usage_hint_text` and
`features.multi_agent_v2.subagent_usage_hint_text` config fields,
including schema/config parsing support.
- Injected the matching root or subagent hint into the initial context
as its own developer message when `multi_agent_v2` is enabled.
- Filtered configured MultiAgentV2 usage-hint developer messages out of
forked parent history so a child thread receives fresh guidance for its
own session source/config.
- Added targeted coverage for config parsing, initial-context rendering,
feature-config deserialization, and forked-history filtering.

## Context examples

With this config:

```toml
[features.multi_agent_v2]
enabled = true
root_agent_usage_hint_text = "Root guidance."
subagent_usage_hint_text = "Subagent guidance."
```

A root thread initial context renders the root hint as a standalone
developer message:

```text
[developer]
<existing developer context, when present>

[developer]
Root guidance.
```

A subagent thread initial context renders the subagent hint instead:

```text
[developer]
<existing developer context, when present>

[developer]
Subagent guidance.
```

When a subagent forks parent history, any parent developer message whose
text exactly matches the configured MultiAgentV2 root or subagent hint
is omitted from the forked history before the child receives its fresh
subagent hint.
2026-04-28 12:31:45 +02:00
xli-oai
803705f795 Add remote plugin uninstall API (#19456)
## Summary
- Adds the remote `plugin/uninstall` request form using required
`pluginId` plus optional `remoteMarketplaceName`, while preserving local
`pluginId` uninstall.
- Adds `codex_core_plugins::remote::uninstall_remote_plugin` for the
deployed ChatGPT plugin backend uninstall path and validates the backend
returns the same id with `enabled: false`.
- Routes app-server remote uninstall through feature checks, remote
plugin id validation, backend mutation, local downloaded cache deletion,
cache clearing, docs, and regenerated protocol schemas.

## Tests
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol
plugin_uninstall_params_serialization_omits_force_remote_sync`
- `cargo test -p codex-app-server plugin_uninstall --test all`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo build -p codex-cli`
- `CODEX_BIN=/Users/xli/code/codex/codex-rs/target/debug/codex python3
/Users/xli/.codex/skills/xli-test-marketplace-api/scripts/run_marketplace_api_matrix.py`
(44 pass / 0 fail)
- `just fix -p codex-app-server-protocol -p codex-app-server -p
codex-tui`
- `just fix -p codex-app-server`
2026-04-28 03:27:53 -07:00
xl-openai
7d72fc8f53 feat: Cache remote plugin bundles on install (#19914)
Remote installs now fetch, validate, download, and cache the plugin
bundle locally
2026-04-28 00:53:27 -07:00
Eric Traut
b985768dc1 Add codex update command (#19933)
## Why

Addresses #9274

Running `codex update` currently starts an interactive Codex session
with `update` as the prompt. That is a rough edge for users who expect a
direct self-update command after seeing the existing update notice, and
it forces them to copy the suggested package-manager command manually.

## What changed

- Added a top-level `codex update` subcommand.
- Reused the existing install-channel detection and update command
runner that the TUI already uses for update prompts.
- Exposed the update-action lookup from `codex-tui` so the CLI can
invoke the same behavior.
- Added CLI coverage to ensure `codex update` is parsed as a subcommand
instead of becoming an interactive prompt.

## Verification

- `cargo test -p codex-cli`
- `cargo test -p codex-tui update_action::tests`
2026-04-27 23:33:59 -07:00
Michael Bolin
0a32c8b396 app-server-protocol: mark permission profiles experimental (#19899)
## Why

`PermissionProfile` is now the canonical internal permissions
representation, but the app-server wire shape is still intentionally
unstable while the migration continues. Stable app-server clients should
not see or generate code for these fields until the wire format settles.

## What changed

- Marks every app-server v2 field that sends `PermissionProfile` as
experimental, including `command/exec`, `thread/start`, `thread/resume`,
`thread/fork`, and `turn/start` request/response payloads.
- Enables per-field experimental inspection for `command/exec`, so
`permissionProfile` is gated without making the entire method
experimental.
- Fixes the generated TypeScript schema filter to be comment-aware. The
previous scanner treated apostrophes inside doc comments as string
delimiters, so some experimental fields leaked into stable TypeScript
even though stable JSON was filtered correctly.

## Verification

- `cargo test -p codex-app-server-protocol`










---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19899).
* #19900
* __->__ #19899
2026-04-28 06:08:34 +00:00
Michael Bolin
341550c275 permissions: store thread sessions as profiles (#19776)
## Why

After thread sessions have a required `PermissionProfile`, the TUI no
longer needs to cache a separate legacy `SandboxPolicy` in
`ThreadSessionState`. Keeping the legacy field would reintroduce two
permission authorities in the session cache and make later
replay/switching logic easier to get wrong.

This PR keeps legacy app-server compatibility at the ingestion boundary:
old `sandbox` response values are still accepted, but they are
immediately converted to a cwd-anchored profile.

## What Changed

- Removes `ThreadSessionState.sandbox_policy`.
- Updates active-session permission syncing to write only the current
`PermissionProfile`.
- Updates thread-read/replay/test fixtures to use profiles as the cached
session permission source.
- Leaves legacy `sandbox` fields in app-server request/response protocol
paths unchanged; those are compatibility boundaries and are converted
before entering cached TUI state.

## Verification

- `cargo test -p codex-tui thread_session_state::tests --lib`
- `cargo test -p codex-tui
inactive_thread_started_notification_initializes_replay_session --lib`
- `cargo test -p codex-tui thread_events --lib`
- `just fix -p codex-tui`




































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19776).
* #19900
* #19899
* __->__ #19776
2026-04-28 05:49:58 +00:00
Eric Traut
92fb848065 Allow large remote app-server resume responses (#19920)
## Why

Remote TUI resume uses the app-server websocket client. That client
inherited tungstenite's default `16 MiB` frame limit, so a large saved
session could make `thread/resume` return a single JSON-RPC response
frame that the client rejected before the TUI could deserialize or
render it.

Fixes #19837

## What Changed

- Configure the remote app-server websocket client with a bounded `128
MiB` max frame/message size.
- Preserve the concrete remote worker exit reason when completing
pending requests after a transport/read failure instead of replacing it
with a generic channel-closed error.
- Add a regression test that sends a single `>16 MiB` JSON-RPC response
frame and verifies the typed request succeeds.

Note: This isn't a perfect fix. It really just moves the limit to a much
larger value. I looked at a bunch of other potential fixes (both
server-side and client-side), and they all involved significant
complexity, had backward-compatibility impact, or impacted performance
of common use cases. This simple fix should address the vast majority of
remote use cases.

## Verification

I reproed the problem locally using a long rollout. Verified that fix
addresses connection drop.
2026-04-27 22:44:10 -07:00
Michael Bolin
fc2a69107c permissions: derive snapshot sandbox projections (#19775)
## Why

`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.

The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.

## What Changed

- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.

## Verification

- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`



































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
2026-04-27 22:30:47 -07:00
Michael Bolin
bf38def44e permissions: make SessionConfigured profile-only (#19774)
## Why

`SessionConfiguredEvent` is the internal event that tells clients what
permissions are active for a session. Emitting both `sandbox_policy` and
`permission_profile` leaves two possible authorities and forces every
consumer to decide which one to honor. At this point in the migration,
the profile is expressive enough to represent managed, disabled, and
external sandbox enforcement, so the internal event can be profile-only.

The wire compatibility concern is older serialized events or rollout
data that only contain `sandbox_policy`; those still need to
deserialize.

## What Changed

- Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
`permission_profile` required.
- Adds custom deserialization so old payloads with only `sandbox_policy`
are upgraded to a cwd-anchored `PermissionProfile`.
- Updates core event emission and TUI session handling to sync
permissions from the profile directly.
- Updates app-server response construction to derive the legacy
`sandbox` response field from the active thread snapshot instead of from
`SessionConfiguredEvent`.
- Updates yolo-mode display logic to treat both
`PermissionProfile::Disabled` and managed unrestricted filesystem plus
enabled network as full-access, while still preserving the distinction
between no sandbox and external sandboxing.

## Verification

- `cargo test -p codex-protocol session_configured_event --lib`
- `cargo test -p codex-protocol serialize_event --lib`
- `cargo test -p codex-exec session_configured --lib`
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox
--lib`
- `cargo test -p codex-tui session_configured --lib`
- `cargo test -p codex-tui
yolo_mode_includes_managed_full_access_profiles --lib`


































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
* #19900
* #19899
* #19776
* #19775
* __->__ #19774
2026-04-27 22:06:47 -07:00
Eric Traut
5ba908d179 Avoid persisting ShutdownComplete after thread shutdown (#19630)
## Why

Fixes #19475.

`codex exec` can finish successfully and then emit an `ERROR` on stderr:

```text
failed to record rollout items: thread <id> not found
```

That happens because shutdown closes the live thread writer before
emitting `ShutdownComplete`. The terminal event was still using the
normal `send_event_raw` path, so it tried to append rollout items
through a recorder that had already been removed. The answer is correct,
but wrappers that treat stderr as failure can retry completed exec runs.

This looks like a likely recent regression from
[#18882](https://github.com/openai/codex/pull/18882), which routed live
thread writes through `ThreadStore` and added the shutdown-time live
writer close. I have not bisected this, so the PR treats #18882 as the
likely source based on the affected shutdown code path rather than a
proven first-bad commit.

## What Changed

`ShutdownComplete` now bypasses rollout persistence after thread
shutdown and is delivered directly to clients. The shutdown path still
records the protocol event in the rollout trace before delivery,
preserving trace visibility without attempting a post-shutdown
thread-store append.

The change also adds a regression test with the in-memory thread store
to assert that shutdown creates and shuts down the live thread without
appending another item after shutdown.
2026-04-27 22:02:08 -07:00
Eric Traut
b7e5588d18 Clarify PR template invitation requirement (#19912)
Addresses #19856

## Summary
- Clarifies that external code contributions are invitation only.
- Points contributors to `docs/contributing.md` for the full policy
instead of using the previous warning phrasing.
2026-04-27 21:45:15 -07:00
marksteinbrick-oai
6a8df2b61d [codex-analytics] include user agent in default headers (#17689)
## Summary
Adds the standard Codex `User-Agent` to shared default headers so the
responses-api WS handshake carries the same client OS and version
context as HTTP requests.

## Testing
- `cargo test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `cargo test -p codex-core --test all responses_websocket`
2026-04-27 21:32:10 -07:00
efrazer-oai
c08177f7d0 refactor: load agent identity runtime eagerly (#19763)
## Summary

AgentIdentity auth previously registered the process task lazily behind
a `OnceCell`. That meant the auth object could be constructed before its
runtime task binding was known.

This PR makes AgentIdentity auth load the runtime task at auth load time
and stores the resulting process task id directly on the auth object.
The model-provider call path can then read a concrete task id instead of
handling a missing lazy value.

## Stack

1. [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762) (merged)
2. **This PR:** [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [fix: configure AgentIdentity AuthAPI base
URL](https://github.com/openai/codex/pull/19904)
4. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)

## Important call sites

| Area | Change |
| --- | --- |
| `AgentIdentityAuth::load` | Registers the process task during auth
loading and stores `process_task_id`. |
| `CodexAuth::from_agent_identity_jwt` | Awaits AgentIdentity auth
loading. |
| model-provider auth | Reads a concrete `process_task_id` instead of an
optional lazy value. |
| AgentIdentity auth tests | Mock task registration now covers eager
runtime allocation. |

## Design decisions

AgentIdentity auth now treats task registration as part of constructing
a usable auth object. That matches how callers use the value: once auth
is present, the model-provider path expects the task-scoped assertion
data to be ready.

## Testing

Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
2026-04-27 21:09:26 -07:00
canvrno-oai
2307aa8d98 Allow /statusline and /title slash commands during active turns (#19917)
- Marks `/title` and `/statusline` as available during active tasks.
- Extends the existing slash-command availability test coverage to
include these commands alongside `/goal`.
2026-04-27 20:57:20 -07:00
Michael Bolin
af95662a70 permissions: require profiles in TUI thread state (#19773)
## Why

`ThreadSessionState` is the TUI's cached view of an app-server session.
To make `PermissionProfile` the canonical runtime permissions model,
cached thread sessions need to always have a profile instead of treating
the profile as an optional supplement to a legacy `sandbox` response
field.

The main compatibility concern is older app-server v2 lifecycle
responses that only include `sandbox` and omit `permissionProfile`:

- `thread/start` -> `ThreadStartResponse.sandbox`
- `thread/resume` -> `ThreadResumeResponse.sandbox`
- `thread/fork` -> `ThreadForkResponse.sandbox`

Those responses must still hydrate correctly when the TUI is pointed at
an older app-server. This PR converts the legacy `sandbox` value into a
`PermissionProfile` immediately at response ingestion time, using the
response `cwd`, so cached sessions do not carry an optional profile that
can later reinterpret cwd-bound grants against a different thread cwd.

This fallback is intentionally boundary compatibility. The follow-up PRs
in this stack continue the cleanup by making `SessionConfiguredEvent`
profile-only, deriving sandbox projections from snapshots only when an
API still needs them, and then removing `sandbox_policy` from
`ThreadSessionState`.

## What Changed

- Makes `ThreadSessionState.permission_profile` required.
- Converts legacy app-server response `sandbox` values into a
`PermissionProfile` at ingestion time using the response cwd.
- Ensures `thread/read` hydration does not reuse a primary session
profile that may be anchored to a different cwd; it uses the active
widget permission settings for the read thread fallback instead of
reusing cached primary-session permissions.
- Keeps the app-server request path unchanged: embedded sessions send
profiles, while remote sessions continue using legacy sandbox overrides
for compatibility.

## Verification

- `cargo test -p codex-tui thread_read --lib`
- `cargo test -p codex-tui
permission_settings_sync_preserves_active_profile_only_rules --lib`
- `cargo test -p codex-tui
resume_response_restores_turns_from_thread_items --lib`
- `cargo test -p codex-tui thread_session_state::tests --lib`




























---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19773).
* #19900
* #19899
* #19776
* #19775
* #19774
* __->__ #19773
2026-04-27 20:39:06 -07:00
pakrym-oai
4e05f3053c Remove ghost snapshots (#19481)
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.

## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.
2026-04-27 18:48:57 -07:00
Dylan Hurd
7e8594fc19 Stabilize plugin MCP fixture tests (#19452)
## Why

Recent `main` CI had repeated flakes in the plugin fixture tests:

- `codex-core::all
suite::plugins::explicit_plugin_mentions_inject_plugin_guidance` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `codex-core::all suite::plugins::plugin_mcp_tools_are_listed` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).

The failures were in the same plugin/MCP fixture family: assertions
expected sample plugin guidance or tool inventory, but the test could
observe the session before the sample MCP server had finished startup.

## Root Cause

`explicit_plugin_mentions_inject_plugin_guidance` submitted the user
turn immediately after constructing the session. MCP startup is
asynchronous, so on a slower or busier CI runner the prompt could be
built before the sample plugin MCP server had reported its tools. That
made the test depend on scheduler timing rather than the fixture being
ready.

`plugin_mcp_tools_are_listed` already needed the same readiness
condition, but its wait logic was local to that test.

## What Changed

- Added a shared `wait_for_sample_mcp_ready` helper for the plugin
fixture tests.
- Wait for `McpStartupComplete` before submitting the explicit plugin
mention turn.
- Reuse the same readiness helper in the MCP tool-listing test.

## Why This Should Be Reliable

The tests now wait for the explicit readiness signal from the sample MCP
server before asserting guidance or tools derived from that server. This
removes the startup race while still exercising the real fixture path,
so the assertions should only run after the plugin inventory is
deterministic.

## Verification

- `cargo test -p codex-core --test all plugins::`
- GitHub CI for this PR is passing.
2026-04-28 01:14:44 +00:00
Michael Zeng
a3350de855 Refactor exec-server filesystem API into codex-file-system (#19892)
## Summary
- Extracted the shared filesystem types and `ExecutorFileSystem` trait
into a new `codex-file-system` crate
- Switched `codex-config` and `codex-git-utils` to depend on that crate
instead of `codex-exec-server`
- Kept `codex-exec-server` re-exporting the same API for existing
callers

## Testing
- Ran `cargo test -p codex-file-system`
- Ran `cargo test -p codex-git-utils`
- Ran `cargo test -p codex-config`
- Ran `cargo test -p codex-exec-server`
- Ran `just fix -p codex-file-system`, `just fix -p codex-git-utils`,
`just fix -p codex-config`, `just fix -p codex-exec-server`
- Ran `just fmt`
- Updated and verified the Bazel module lockfile
2026-04-27 17:43:15 -07:00
colby-oai
2f3b5ed81a disallow fileparams metadata for custom mcps (#19836)
## Summary
Disallow fileParams metadata for custom MCPs 

Restricts Codex openai/fileParams handling to the first-party codex_apps
MCP server. Custom MCP servers may still advertise the metadata, but
Codex now ignores it for upload rewriting, preventing non-Apps tools
from receiving signed OpenAI file refs for local paths. Added a
regression test for the allowed and denied cases.
2026-04-27 20:42:10 -04:00
Michael Bolin
755880ef9c permissions: derive config defaults as profiles (#19772)
## Why

This continues the permissions migration by making legacy config default
resolution produce the canonical `PermissionProfile` first. The legacy
`SandboxPolicy` projection should stay available at compatibility
boundaries, but config loading should not create a legacy policy just to
immediately convert it back into a profile.

Specifically, when `default_permissions` is not specified in
`config.toml`, instead of creating a `SandboxPolicy` in
`codex-rs/core/src/config/mod.rs` and then trying to derive a
`PermissionProfile` from it, we use `derive_permission_profile()` to
create a more faithful `PermissionProfile` using the values of
`ConfigToml` directly.

This also keeps the existing behavior of `sandbox_workspace_write` and
extra writable roots after #19841 replaced `:cwd` with `:project_roots`.
Legacy workspace-write defaults are represented as symbolic
`:project_roots` write access plus symbolic project-root metadata
carveouts. Extra absolute writable roots are still added directly and
continue to get concrete metadata protections for paths that exist under
those roots.

The platform sandboxes differ when a symbolic project-root subpath does
not exist yet.

* **Seatbelt** can encode literal/subpath exclusions directly, so macOS
emits project-root metadata subpath policies even if `.git`, `.agents`,
or `.codex` do not exist.
* **bwrap** has to materialize bind-mount targets. Binding `/dev/null`
to a missing `.git` can create a host-visible placeholder that changes
Git repo discovery. Binding missing `.agents` would not affect Git
discovery, but it would still create a host-visible project metadata
placeholder from an automatic compatibility carveout. Linux therefore
skips only missing automatic `.git` and `.agents` read-only metadata
masks; missing `.codex` remains protected so first-time project config
creation goes through the protected-path approval flow. User-authored
`read` and `none` subpath rules keep normal bwrap behavior, and `none`
can still mask the first missing component to prevent creation under
writable roots.

## What Changed

- Adds profile-native helpers for legacy workspace-write semantics,
including `PermissionProfile::workspace_write_with()`,
`FileSystemSandboxPolicy::workspace_write()`, and
`FileSystemSandboxPolicy::with_additional_legacy_workspace_writable_roots()`.
- Makes `FileSystemSandboxPolicy::workspace_write()` the single legacy
workspace-write constructor so both `from_legacy_sandbox_policy()` and
`From<&SandboxPolicy>` include the project-root metadata carveouts.
- Removes the no-carveout `legacy_workspace_write_base_policy()` path
and the `prune_read_entries_under_writable_roots()` cleanup that was
only needed by that split construction.
- Adds `ConfigToml::derive_permission_profile()` for legacy sandbox-mode
fallback resolution; named `default_permissions` profiles continue
through the permissions profile pipeline instead of being reconstructed
from `sandbox_mode`.
- Updates `Config::load()` to start from the derived profile, validate
that it still has a legacy compatibility projection, and apply
additional writable roots directly to managed workspace-write filesystem
policies.
- Updates Linux bwrap argument construction so missing automatic
`.git`/`.agents` symbolic project-root read-only carveouts are skipped
before emitting bind args; missing `.codex`, user-authored `read`/`none`
subpath rules, and existing missing writable-root behavior are
preserved.
- Adds coverage that legacy workspace-write config produces symbolic
project-root metadata carveouts, extra legacy workspace writable roots
still protect existing metadata paths such as `.git`, and bwrap skips
missing `.git`/`.agents` project-root carveouts while preserving missing
`.codex` and user-authored missing subpath rules.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19772).
* #19776
* #19775
* #19774
* #19773
* __->__ #19772
2026-04-27 16:50:10 -07:00
pakrym-oai
c5a495c2cd Streamline review and feedback handlers (#19498)
## Why

The remaining review, interrupt, fuzzy search, feedback, and git-diff
handlers still had local send-error branches that obscured otherwise
simple request handling. This final slice flattens those handlers
without changing the public protocol behavior.

## What Changed

- Streamlined review start, turn interrupt, fuzzy search session,
feedback upload, and git diff handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Converted validation and upload failures into returned JSON-RPC errors
where that avoids nested `send_error`/`return` blocks.
- Left unrelated sandbox setup and notification code untouched.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::review --
--test-threads=1`
2026-04-27 16:36:04 -07:00
Matthew Zeng
dcd139b7c4 Add MCP app feature flag (#19884)
## Summary
- Add the `enable_mcp_apps` feature flag to the `codex-features`
registry
- Keep it under development and disabled by default

## Testing
- Unit tests for `codex-features` passed
- Formatting passed
2026-04-27 16:28:49 -07:00
canvrno-oai
e64c765673 Show action required in terminal title (#18372)
Implements #18162

This updates the TUI terminal title to show an explicit action-required
state when Codex is blocked on user approval or input. The terminal
title now uses the activity title item to cover both active work and
blocked-on-user states, while still accepting the legacy spinner config
value.

Changes
- Rename the terminal title item from `spinner` to `activity` while
preserving legacy config compatibility
- Show `[ ! ] Action Required `while approval or input overlays are
active, with a blinking `[ . ]` alternate state
- Suppress the normal working spinner while Codex is blocked on user
action
- Add targeted coverage for action-required title behavior and legacy
title-item parsing

Testing
- Trigger an approval or input modal and confirm the tab title
alternates between `[ ! ] Action Required` and `[ . ] Action Required`
- Disable the activity title item and confirm the action-required title
does not appear
- Resolve the prompt and confirm the title returns to the normal
spinning/idel state


https://github.com/user-attachments/assets/e9ecc530-a6be-4fd7-b9a6-d550a790eb2c
2026-04-27 15:27:11 -07:00
pakrym-oai
e903d000b0 Streamline turn and realtime handlers (#19497)
## Why

Turn and realtime handlers had nested validation and send-error branches
that made the request path longer than the behavior warranted. This
slice keeps the same request semantics while letting the handlers return
errors from the failing step.

## What Changed

- Streamlined turn start, injected item, and turn steer request handling
in `codex-rs/app-server/src/codex_message_processor.rs`.
- Applied the same result-returning shape to realtime session response
handlers.
- Preserved existing request validation and thread-manager interactions.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_steer --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_inject_items --
--test-threads=1`
2026-04-27 15:21:59 -07:00
pakrym-oai
739ab6bc51 Streamline thread resume and fork handlers (#19495)
## Why

Thread resume and fork had some of the deepest error-handling
indentation in this area because helpers emitted request errors
directly. Returning those failures gives the handlers a single request
boundary while preserving the async pending-resume behavior.

## What Changed

- Converted thread resume helpers in
`codex-rs/app-server/src/codex_message_processor.rs` to return `Result`
values for validation and view loading failures.
- Applied the same pattern to thread fork request handling.
- Simplified pending resume error construction by using the shared
JSON-RPC error helpers.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_resume --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_fork --
--test-threads=1`
2026-04-27 15:04:53 -07:00
cassirer-openai
30c5c768de [codex] Trace cancelled inference streams (#19839)
Records cancelled inference streams when Codex stops consuming a
provider response before `response.completed`, preserving complete
output items observed before cancellation.

Also closes still-running inference calls when the owning turn ends, so
reduced rollout traces do not leave stale `Running` inference nodes.

Covered by focused reducer coverage and a core stream-drop test for
partial output preservation.
2026-04-27 21:58:29 +00:00
pakrym-oai
2be9fd5a93 Streamline thread read handlers (#19494)
## Why

The thread read/list handlers mostly assemble views, but their error
handling was interleaved with response emission. Returning view-building
errors from the helper path keeps those handlers focused on data
assembly.

## What Changed

- Added a small mapper for `ThreadReadViewError` to JSON-RPC errors in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Streamlined thread list, loaded-thread, read, turn-list, and summary
handlers to produce result values for the request boundary.
- Kept the existing invalid-request vs internal-error distinctions for
missing or unreadable thread data.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all conversation_summary --
--test-threads=1`
2026-04-27 14:30:24 -07:00
Steve Coffey
0f40261e86 Publish Python SDK with Codex-pinned versioning (#18996)
**note**: a large chunk of this diff comes from regenerating Python
types after app-server schema changes on `main`.

This is PR 3 of 3 for the Python SDK PyPI publishing split. PR #18862
refreshed the generated SDK surface, and PR #18865 made the runtime
package publishable as `openai-codex-cli-bin`; this final PR makes the
SDK package publishable as `openai-codex-app-server-sdk` and pins both
packages to the same Codex runtime version.

The key idea is that the published SDK version is the Codex runtime
version. That one version now drives the SDK package version, the exact
runtime dependency, the client version reported by the SDK, and the
bootstrap runtime pin. This keeps release-time versioning in one lane
instead of scattering checked-in literals through the package.

## What changed

- Rename the SDK distribution from `codex-app-server-sdk` to
`openai-codex-app-server-sdk` for conflict-free PyPI publishing.
- Use `stage-sdk --codex-version ...` with one Codex version for both
the SDK package version and exact `openai-codex-cli-bin` dependency.
- Preserve hidden legacy `--runtime-version` / `--sdk-version` args only
to reject mismatched versions during staging.
- Map PEP 440 package versions back to Codex release tags for runtime
setup downloads, e.g. `0.116.0a1` -> `rust-v0.116.0-alpha.1`.
- Derive `codex_app_server.__version__`, the default
`AppServerConfig.client_version`, and
`_runtime_setup.pinned_runtime_version()` from the SDK package/project
version instead of hardcoding duplicate version strings.
- Carry the current generated SDK refresh from `main` so
`generate-types` stays clean after recent app-server schema changes.
- Update `sdk/python/uv.lock` for the renamed editable package.

## Validation

- `uv run --extra dev pytest` in `sdk/python` -> 59 passed, 37 skipped.
- Targeted `uv run ruff check` for the touched SDK files.
- `git diff --check`.
- Staged runtime with `--codex-version rust-v0.116.0-alpha.1
--platform-tag macosx_11_0_arm64`.
- Staged SDK with `--codex-version rust-v0.116.0-alpha.1`.
- Built runtime wheel, SDK wheel, and SDK sdist.
- `twine check /tmp/codex-python-pr3-build/dist/*` -> passed.
- Clean venv smoke installed `openai-codex-app-server-sdk==0.116.0a1`
from local dist and pulled `openai-codex-cli-bin==0.116.0a1`.
- Smoke imports passed for `Codex` and `bundled_codex_path()`.
2026-04-27 14:28:46 -07:00
starr-openai
4ded800374 [codex] Shard exec Bazel integration test (#19862)
## Summary

- shard `//codex-rs/exec:exec-all-test` into 8 Bazel shards
- keep the existing `no-sandbox` test tag unchanged

## Why

The Windows Bazel lane has been timing out this aggregated integration
test target at the default 300s test timeout. The target runs the
combined `codex-rs/exec/tests/all.rs` integration binary; sharding lets
Bazel split the Rust test cases across parallel test actions instead of
running the whole integration suite as one long action.

## Validation

Not run locally, per the Codex repo workflow for development-phase
changes.

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 21:22:53 +00:00
pakrym-oai
5c30d79afb Streamline thread mutation handlers (#19493)
## Why

Thread mutation handlers had many short error branches whose only job
was to emit a JSON-RPC error and stop. This slice keeps those errors
visible, but lets each handler build a result and return early from
validation helpers instead of nesting the main path.

## What Changed

- Streamlined thread archive/unarchive, rename, memory, metadata,
rollback, compact, background terminal, shell, and guardian handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Reused shared JSON-RPC error constructors in
`codex-rs/app-server/src/bespoke_event_handling.rs` for rollback-related
request failures.
- Preserved direct `send_error` calls where they remain the simplest
boundary for pending async event responses.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_rollback --
--test-threads=1`
2026-04-27 14:18:55 -07:00
joeytrasatti-openai
798de22637 [codex-backend] Prefer state git metadata in filtered thread lists (#19874)
### Summary

- `thread/list` filtered filesystem results already overlay state DB
metadata, but the existing merge only filled missing git fields.
- Prefer non-null SQLite git metadata over stale non-null rollout values
so persisted branch/SHA/origin updates are reflected in filtered thread
lists.
- Update the focused merge test to cover stale filesystem git metadata
being replaced by state-backed values.

### Testing

now getting expected icons
<img width="426" height="913" alt="Screenshot 2026-04-27 at 1 45 45 PM"
src="https://github.com/user-attachments/assets/027fb7e7-f54d-4353-8423-cb76f3c8f5ac"
/>
2026-04-27 21:02:40 +00:00
pakrym-oai
c5e2921e1d Streamline thread start handler (#19492)
## Why

The thread start handler mixed request validation, thread construction,
dynamic-tool validation, and JSON-RPC error emission in one nested flow.
Returning request errors from the helper path makes the successful setup
path easier to follow.

## What Changed

- Reworked `thread/start` handling in
`codex-rs/app-server/src/codex_message_processor.rs` so helper methods
return `Result` and the handler emits one result.
- Moved dynamic-tool validation failures into returned JSON-RPC errors
instead of local `send_error` branches.
- Preserved the existing thread creation and task-spawning behavior.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::dynamic_tools --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
2026-04-27 13:56:20 -07:00
Michael Bolin
4b55979755 permissions: remove cwd special path (#19841)
## Why

The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.

## What changed

- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.

## Compatibility

Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.

## Follow-up

This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.

## Verification

- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
2026-04-27 13:41:27 -07:00
Eric Traut
52c06b8759 Preserve TUI markdown list spacing after code blocks (#19706)
## Why

Fixes #19702.

The TUI markdown renderer could visually attach the next list marker to
a fenced code block inside the previous list item, even when the source
markdown included a blank line before the next item. That made
block-heavy loose lists harder to read, while the desired behavior is
still to keep simple lists compact.

## What changed

- Track whether the current rendered list item contains a code block.
- Preserve one blank separator before the following list marker only
when the previous item contained a code block.
- Add regression coverage for both paths: code-block list items keep the
separator, and simple loose list items stay compact.

## Verification

- `cargo test -p codex-tui markdown_render`

I also manually verified that the bug exists before and is fixed after.

## Before
<img width="437" height="240" alt="Screenshot 2026-04-26 at 1 19 01 PM"
src="https://github.com/user-attachments/assets/3bc9d64d-2dba-40d9-9d6b-a1d0b3c0f728"
/>

## After
<img width="410" height="269" alt="Screenshot 2026-04-26 at 1 18 54 PM"
src="https://github.com/user-attachments/assets/19c15bee-da32-455e-a7cb-e05eb85f4ea0"
/>
2026-04-27 13:40:46 -07:00
Eric Traut
0bd25ab374 Delay approval prompts while typing (#19513)
## Why

Fixes #7744. Approval modals can currently appear while the user is
typing ahead in the TUI composer, which lets plain letters like `y` or
`a` get consumed as approval shortcuts instead of staying in the draft
input.

## What changed

- Track recent composer typing activity in `bottom_pane/mod.rs`.
- Delay new approval overlays for 1 second while the composer is active,
keeping delayed requests queued until the user is idle.
- Preserve the existing active-overlay behavior so approvals that arrive
while an approval modal is already open are still queued into that
overlay.
- Prune delayed approvals when app-server resolution says the request
has already been handled.

## Verification

Added unit coverage for immediate approvals, delayed approvals, idle
deadline reset, typed shortcut letters staying in the composer, shortcut
handling after the delay, and resolved delayed-request pruning.

Focused `codex-tui` test groups pass locally. The full `cargo test -p
codex-tui` run currently aborts in
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`;
that same test also fails when run alone with the same stack overflow.

Manual reviewer check:

1. Start the TUI from the repo root:

   ```bash
   RUST_LOG=trace just codex \
     -c log_dir=<temp-log-dir> \
     --ask-for-approval untrusted \
     --sandbox workspace-write
   ```

2. Submit this prompt:

   ```text
   create a file text.txt on my desktop
   ```

3. While the agent is preparing the approval request, immediately type
text such as `ya this should stay in the composer`.
4. Confirm the typed-ahead `y`/`a` remains in the composer instead of
approving the request.
5. Stop typing for about 1 second; the approval modal should then
appear.
6. Once the modal is visible, press `y` and confirm the approval
shortcut works normally.
2026-04-27 13:20:55 -07:00
Eric Traut
850f035b8c Fix filtered thread-list resume regression in TUI (#19591)
## Why

`codex resume` regressed after
[#18502](https://github.com/openai/codex/pull/18502) changed the default
`thread/list` scan-and-repair path for metadata-filtered listings. The
TUI resume picker uses `thread/list` with source/provider/cwd filters
and `useStateDbOnly: false`, which is the intended
correctness-preserving mode: it should still consult the filesystem so
healthy, missing, or stale SQLite state can be repaired.

The regression was that #18502 made that filtered, filesystem-backed
path call `reconcile_rollout` for every filesystem hit, and then call it
again for each SQLite hit. When `reconcile_rollout` does not already
have extracted rollout items, it falls back to loading the full JSONL
rollout. That changed the resume picker’s first page from a cheap
rollout-head scan plus SQLite read-repair into full-file reads for large
sessions, so a few long threads could dominate TUI startup/resume
latency.

This change addresses the regression by keeping `useStateDbOnly: false`
on the correctness-preserving path while avoiding unnecessary full JSONL
reads for rows the filesystem scan has already validated.
Source/provider/cwd filters can be decided from rollout-head metadata,
so non-search resume listings only need the lightweight read-repair path
for filesystem hits. Full reconciliation is still used for DB-only
filtered rows because those can be stale false positives, and for search
listings because search can depend on title metadata that may require
scanning the full rollout.

This fixes #19483.

## What changed

- For non-search filtered listings, repair filesystem hits with the
lightweight `read_repair_rollout_path` path instead of full
`reconcile_rollout`.
- Track thread IDs proven by the filesystem scan and only fully
reconcile SQLite-filtered hits that the filesystem scan did not return,
preserving stale-DB false-positive cleanup without full-reading every
healthy rollout.
- Leave search listings on full reconciliation, since search depends on
full title metadata rather than only source/provider/cwd metadata from
the rollout head.

## Verification

- `cargo test -p codex-rollout list_threads`
- `cargo test -p codex-app-server thread_list`
2026-04-27 13:02:39 -07:00
Curtis 'Fjord' Hawthorne
277186ec85 Cap original-detail image token estimates (#19865)
Clamp original-detail image patch estimates to the current 10k patch
budget so large images cannot inflate local context accounting without
bound. Add regression coverage for an over-budget image.

Fixes openai/codex#19806.
2026-04-27 12:39:24 -07:00
rhan-oai
215d5a8f7c [codex-analytics] remove ga flag (#19863) 2026-04-27 19:29:19 +00:00
sayan-oai
85c1500569 fix: filter dynamic deferred tools from model_visible_specs (#19771)
fixes #19486

### Problem
Right now dynamic deferred tools are filtered at normal-turn prompt
building time, rather than upstream while building the `ToolRouter`
itself. This causes issues because dynamic deferred tools are then
wrongly included in the router's `model_visible_specs`, which is what
the compaction request-building flow relies on.

### Fix
Move the dynamic deferred tool filtering to `ToolRouter` creation time
to solve this problem for every request that relies on `ToolRouter` for
`model_visible_specs`, which solves the issue generically.

### Tests
Added unit + integration tests to ensure dynamic deferred tools are
omitted from `model_visible_specs` and compaction request respectively.

Tested against live `/compact` endpoint; raw deferred dynamic tools
without `tool_search` returned `400` (current bug), while the filtered
payload (this fix) returns `200`.
2026-04-27 19:09:02 +00:00
pakrym-oai
e5709db6dc Streamline account and command handlers (#19491)
## Why

Account login/logout and command exec handlers were doing local error
sends in the middle of each handler. That made these request flows
branch heavily even though most of the logic is validate, perform the
operation, and return the response.

## What Changed

- Converted ChatGPT/API-key login, login cancel, logout, rate-limit, and
add-credit handlers in
`codex-rs/app-server/src/codex_message_processor.rs` to compute `Result`
values and send them once at the request boundary.
- Applied the same shape to command exec start/write/resize/terminate
handlers.
- Kept side-effect notifications in the same places after successful
request handling.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::account --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::command_exec --
--test-threads=1`
2026-04-27 12:03:49 -07:00
Michael Bolin
cafe717dca ci: migrate Bazel setup away from archived setup-bazelisk (#19851)
## Why

All Bazel CI jobs are currently blocked in the `setup-bazelisk` step
while trying to download Bazelisk.
[`bazelbuild/setup-bazelisk`](https://github.com/bazelbuild/setup-bazelisk)
is archived, and its README now recommends migrating to
[`bazel-contrib/setup-bazel`](https://github.com/bazel-contrib/setup-bazel),
so leaving our workflows on the archived action leaves CI exposed to
exactly this sort of outage.

Because `v8-canary` now consumes the shared local `setup-bazel-ci`
action, that workflow also needs to trigger when the action changes.
Without that follow-up, Bazel bootstrap regressions specific to the V8
canary path could be skipped by the workflow path filters.

## What Changed

- Switched `.github/actions/setup-bazel-ci/action.yml` from
`bazelbuild/setup-bazelisk` to `bazel-contrib/setup-bazel`, pinned to
`0.19.0`.
- Left `bazelisk-version` unset so GitHub-hosted runners can use their
preinstalled Bazelisk instead of downloading `1.x` at job start.
- Updated `.github/workflows/rusty-v8-release.yml` and
`.github/workflows/v8-canary.yml` to use the shared `setup-bazel-ci`
action instead of referencing `setup-bazelisk` directly.
- Added `.github/actions/setup-bazel-ci/**` to the `pull_request` and
`push` path filters in `.github/workflows/v8-canary.yml` so changes to
the shared Bazel setup action still run the canary workflow.
- Kept the existing repository-cache and Windows-specific Bazel setup
logic intact.

This keeps Bazel version selection anchored by `.bazelversion` while
removing the failing dependency on the archived setup action.

## Verification

- Searched `.github/` to confirm there are no remaining `setup-bazelisk`
references.
- Parsed the updated workflow and action YAML locally with Ruby's
`YAML.load_file`.
2026-04-27 11:37:30 -07:00
Michael Bolin
c2084552d9 ci: pin npm staging smoke test to a recent rust-release run (#19854)
## Why

The `build-test` workflow stages a representative `codex` npm tarball by
asking `scripts/stage_npm_packages.py` to look up a past `rust-release`
run for a hardcoded release version. That started failing in CI because
the representative version in `.github/workflows/ci.yml` was stale:

- the workflow was still using `0.115.0`
- `stage_npm_packages.py` resolves native artifacts by looking for a
`rust-release` run on the `rust-v<version>` branch
- that lookup no longer found a matching run for `rust-v0.115.0`, so the
smoke test failed before it could stage the package

This PR makes that smoke test depend on a known-good recent release run
instead of an older branch lookup that is no longer reliable.

## What Changed

- Updated the representative release version in
`.github/workflows/ci.yml` from `0.115.0` to `0.125.0`.
- Added an explicit `WORKFLOW_URL` pointing at a recent successful
`rust-release` run:
`https://github.com/openai/codex/actions/runs/24901475298`.
- Passed that URL to `scripts/stage_npm_packages.py` via
`--workflow-url` so the job can reuse the expected native artifacts
directly instead of relying on `gh run list --branch rust-v<version>` to
discover them.

That keeps the npm staging smoke test representative while making it
less sensitive to older release branch history disappearing from the
GitHub Actions lookup path.

## Verification

- Inspected the failing CI log from `build-test` and confirmed the
failure came from `scripts/stage_npm_packages.py` being unable to
resolve `rust-v0.115.0`.
- Confirmed that
`https://github.com/openai/codex/actions/runs/24901475298` is a
successful `rust-release` run for `rust-v0.125.0`.
2026-04-27 11:32:48 -07:00
efrazer-oai
2009f6e894 refactor: make auth loading async (#19762)
## Summary

Auth loading used to expose synchronous construction helpers in several
places even though some auth sources now need async work. This PR makes
the auth-loading surface async and updates the callers to await it.

This is intentionally only plumbing. It does not change how
AgentIdentity tokens are decoded, how task runtime ids are allocated, or
how JWT signatures are verified.

## Stack

1. **This PR:** [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762)
2. [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)

## Important call sites

| Area | Change |
| --- | --- |
| `codex-login` auth loading | `CodexAuth` and `AuthManager`
construction paths now await auth loading. |
| app-server startup | Auth manager construction is awaited during
initialization. |
| CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
await the same behavior. |
| cloud requirements storage loader | The loader becomes async so it can
share the same auth construction path. |
| auth tests | Tests that load auth now run in async contexts. |

## Testing

Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
2026-04-27 11:00:27 -07:00
pakrym-oai
4ed22fc7d2 Streamline plugin, apps, and skills handlers (#19490)
## Why

The plugin, app, and skills handlers had a lot of repeated
`send_error`/`return` branches that made the success path hard to scan.
This slice keeps behavior the same while moving fallible steps into
local response-producing helpers, so the request boundary can send one
result.

## What Changed

- Converted plugin list/install/uninstall handlers in
`codex-rs/app-server/src/codex_message_processor/plugins.rs` to return
`Result<*Response, JSONRPCErrorError>` from helper methods and call
`send_result` once.
- Added local error-mapping helpers for plugin install/uninstall and
marketplace failures.
- Applied the same mechanical shape to app list, skills list/config, and
marketplace add/remove/upgrade handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::app_list --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::plugin_ --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::skills_list --
--test-threads=1`
2026-04-27 10:18:25 -07:00
Eric Traut
48dd7b58f0 Render delegated patch approval details (#19709)
## Why

Fixes #19632.

When a delegated agent requests approval for an in-progress file change,
the parent TUI handles that request from an inactive thread. The app
server already sent the `FileChange` item with the proposed diff, but
the inactive-thread approval path was not recovering and rendering it
the same way as the active-thread path.

The result was an inconsistent approval prompt: main-thread edits show a
normal patch preview history item before the approval modal, while
delegated edits did not show that preview in the transcript flow.

## What Changed

- Recover buffered or historical `FileChange` item changes when building
inactive-thread file-change approval requests.
- Reuse the app-server file-change conversion helper for both live
transcript rendering and inactive-thread approvals.
- Render recovered delegated patches as a normal patch preview history
cell before the approval modal.
- Keep apply-patch approval modals focused on the decision prompt and
optional metadata; they do not render a synthetic command line or embed
the diff body.

## Manual Repro And Verification

I manually reproduced the issue using a file under `~/Desktop` so the
write would require approval.

Before the fix:

1. Ask the main thread: `Use apply_patch, not shell redirection or
Python, to create ~/Desktop/bug1.txt with three short lines.`
2. Observe the expected TUI shape: the transcript shows a normal patch
preview such as `• Added ~/Desktop/bug1.txt (+N -0)` above the approval
modal, and the modal contains only the approval prompt/options without a
synthetic command line.
3. Ask for the delegated path: `Spawn a worker. Have it use apply_patch,
not shell redirection or Python, to create ~/Desktop/bug1.txt with four
short lines.`
4. Observe the delegated approval is inconsistent: the parent view does
not render the proposed patch as the normal transcript preview before
the modal, so the diff context is missing from the stream or appears
inside the modal instead of in the history flow.

After the fix:

1. Repeat the delegated worker prompt with `apply_patch`.
2. Confirm the parent view renders the same normal patch preview history
cell (`• Added ~/Desktop/bug1.txt (+N -0)` plus the diff) immediately
before the approval modal.
3. Confirm the approval modal remains focused on the decision prompt.
For delegated approvals it may show the worker thread label, but it
should not show a `$ apply_patch` command line or embed the diff body in
the modal.
2026-04-27 10:07:15 -07:00
Eric Traut
0e2300c02c Persist shell mode commands in prompt history (#19618)
## Why

`!` shell commands are currently surfaced as "Bash mode", which is
misleading for users running shells such as PowerShell or zsh. Those
commands also bypass the persistent prompt history path, so they cannot
be recalled after starting a new session.

Fixes #19613.

## What changed

- Rename the TUI footer label and related test wording from "Bash mode"
to "Shell mode".
- Persist accepted `!` shell commands to prompt history with the leading
`!`, so recall restores the composer into shell mode across sessions.
- Add coverage for immediate and queued shell-command submissions
emitting the prompt-history update.

## Verification

- `cargo test -p codex-tui bang_shell`
- `cargo test -p codex-tui shell_command_uses_shell_accent_style`
- `cargo test -p codex-tui footer_mode_snapshots`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`

Manually verified fix after confirming presence of bug prior to fix.
2026-04-27 09:54:25 -07:00
Eric Traut
6c51bf0c7c Hide rewind preview when no user message exists (#19510)
## Why

Fixes #19508.

In a fresh TUI session, pressing `Esc` twice entered the rewind
transcript overlay even though there was no user message to rewind to.
That produced an empty header-only transcript view and exposed a rewind
flow that could not select a valid target.

## What changed

The backtrack flow now checks whether a user-message rewind target
exists before opening the transcript preview. If no target exists, Codex
stays in the main TUI and shows `No previous message to edit.` instead
of opening an empty overlay.

The same guard applies when starting rewind preview from the transcript
overlay, and the first `Esc` no longer advertises the “edit previous
message” hint when there is no previous message available.

Snapshot coverage was added for the unavailable rewind info message,
along with a small target-detection test.
2026-04-27 09:51:12 -07:00
jif-oai
bb83eec825 chore: split memories part 1 (#19818)
Extract memories into 2 different crates
2026-04-27 16:01:05 +02:00
jif-oai
f431ec12c9 nit: one more fix (#19813)
Fix this:
https://github.com/openai/codex/pull/19812#discussion_r3147529230
2026-04-27 15:32:31 +02:00
jif-oai
79b4f691a6 Avoid rewriting Phase 2 selection on clean workspace (#19812)
## Why

Phase 2 can now claim the global consolidation lock on startup even when
the git-backed memory workspace is already clean. The clean-workspace
path still finalized through the normal Phase 2 success path, which
clears and re-marks `selected_for_phase2` rows. That made no-op startups
perform avoidable writes to `stage1_outputs`, creating unnecessary DB
I/O and contention when no memory files changed.

## What Changed

- Added a preserving-selection Phase 2 finalizer in `codex-state` that
only marks the global job row as succeeded.
- Kept the existing `mark_global_phase2_job_succeeded` behavior for real
consolidation runs, where the selected Phase 2 snapshot must be
rewritten.
- Switched the `succeeded_no_workspace_changes` branch in
`core/src/memories/phase2.rs` to use the preserving-selection finalizer.
- Added a regression test that installs a SQLite trigger on
`stage1_outputs` and verifies the clean finalizer performs zero updates
there.

## Testing

- `cargo test -p codex-state`
- `cargo test -p codex-core memories::tests::phase2`
2026-04-27 15:14:16 +02:00
jif-oai
5d314f324c Allow Phase 2 memory claims after retry exhaustion (#19809)
## Why

The Phase 2 memories job row is only the global lock for the git-backed
memory workspace. Manual memory edits do not enqueue new Stage 1 work,
so a Phase 2 row with `retry_remaining = 0` could be skipped before the
worker ever claimed the lock and generated `phase2_workspace_diff.md`.

That left workspace-only changes unconsolidated after repeated failures,
even when retry backoff had elapsed and the filesystem had real diffable
work.

## What Changed

- Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after
the retry budget is exhausted, while still respecting active `retry_at`
backoff and fresh running leases.
- Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and
update the outcome docs to match.
- Clamp Phase 2 retry bookkeeping at zero when failed attempts are
recorded.

## Verification

- Added
`phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to
cover the exhausted-budget lock claim path.
- Ran `cargo test -p codex-state`.
2026-04-27 14:58:11 +02:00
jif-oai
01ab25dbb5 feat: use git-backed workspace diffs for memory consolidation (#18982)
## Why

This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc

The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
 
As a follow-up we will need to add support for user's edition while
`morpheus` is running

## What Changed

- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.

## Verification

- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 14:32:44 +02:00
jif-oai
f8c527e529 multi_agent_v2: move thread cap into feature config (#19792)
## Why

`features.multi_agent_v2.max_concurrent_threads_per_session` is meant to
be the MultiAgentV2-specific session thread cap: it counts the root
thread and all open subagent threads. The previous implementation kept
this surface tied to `agents.max_threads`, which made it a global
subagent-only cap and allowed the legacy setting to coexist with
MultiAgentV2.

## What Changed

- Added `max_concurrent_threads_per_session` to
`[features.multi_agent_v2]` with default `4`.
- Removed the `[agents] max_concurrent_threads_per_session` alias to
`agents.max_threads`.
- When MultiAgentV2 is enabled, reject `agents.max_threads` and derive
the existing internal subagent slot limit as
`max_concurrent_threads_per_session - 1`.
- Regenerated `core/config.schema.json` and added coverage for the new
config semantics.

## Result
```
➜  codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3
╭────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0)                           │
│                                                    │
│ model:     gpt-5.5 xhigh   fast   /model to change │
│ directory: ~/code/codex                            │
╰────────────────────────────────────────────────────╯

  Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager.

› Can you try to spawn 4 agents


• I’ll try to start four lightweight agents at once and report exactly what the runtime accepts.

• Spawned Russell [no-apps] (gpt-5.5 xhigh)
  └ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Spawned Descartes [no-apps] (gpt-5.5 xhigh)
  └ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Agent spawn failed
  └ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Agent spawn failed
  └ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed.
```

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 13:31:56 +02:00
Eric Traut
4f1d5f00f0 Add Codex issue digest skill (#19779)
Problem: Maintainers need a shared way to run Codex GitHub issue digests
without copying large prompts or relying on manual GitHub page
summaries.

Solution: Add a reusable codex-issue-digest skill with a deterministic
GitHub collector, owner/all-label windows, reaction-aware activity
metrics, scaled attention markers, and focused tests.
2026-04-26 23:16:43 -07:00
Michael Bolin
a6ca39c630 permissions: derive legacy exec policies at boundaries (#19737)
## Why

After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.

## What Changed

- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.

## Verification

- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
2026-04-26 22:11:49 -07:00
Michael Bolin
523e4aa8e3 permissions: constrain requirements as profiles (#19736) 2026-04-26 21:49:30 -07:00
Michael Bolin
0ccd659b4b permissions: store only constrained permission profiles (#19735) 2026-04-26 20:59:58 -07:00
Won Park
8033b6a449 Add /auto-review-denials retry approval flow (#19058)
## Why

Auto-review can deny an action that the user later decides they want to
retry. Today there is no TUI surface for selecting a recent denial and
sending explicit approval context back into the session, so users have
to restate intent manually and the retry can be reviewed without the
original denied action context.

This adds a narrow TUI-driven path for approving a recent denied action
while still keeping the retry inside the normal auto-review flow.

## What Changed

- Added `/auto-review-denials` to open a picker of recent denied
auto-review actions.
- Added a small in-memory TUI store for the 10 most recent denied
auto-review events.
- Selecting a denial sends the structured denied event back through the
existing core/app-server op path.
- Core now injects a developer message containing the approved action
JSON rather than the full assessment event.
- Auto-review transcript collection now preserves this specific approval
developer message so follow-up review sessions can see the user approval
context.
- Added TUI snapshot/unit coverage for the picker and approval dispatch
path.
- Added core coverage for retaining the approval developer message in
the auto-review transcript.

## Verification

- `cargo test -p codex-core
collect_guardian_transcript_entries_keeps_manual_approval_developer_message`
- `cargo test -p codex-tui auto_review_denials`
- `cargo test -p codex-tui
approving_recent_denial_emits_structured_core_op_once`

## Notes

This intentionally keeps retries going through auto-review. The approval
signal is context for the exact previously denied action, not a blanket
bypass for similar future actions.
2026-04-27 03:43:53 +00:00
Michael Bolin
0d8cdc0510 permissions: centralize legacy sandbox projection (#19734)
## Why

The remaining migration work still needs `SandboxPolicy` at a few
compatibility boundaries, but those projections should come from one
canonical path. Keeping ad hoc legacy projections scattered through
app-server, CLI, and config code makes it easy for behavior to drift as
`PermissionProfile` gains fidelity that the legacy enum cannot
represent.

## What Changed

- Adds `Permissions::legacy_sandbox_policy(cwd)` and
`Config::legacy_sandbox_policy()` as the compatibility projection from
the canonical `PermissionProfile`.
- Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs
are checked after they are converted into profile semantics.
- Updates app-server command handling, Windows sandbox setup, session
configuration, and sandbox summaries to use the centralized projection
helper.
- Leaves `SandboxPolicy` in place only for boundary inputs/outputs that
still speak the legacy abstraction.

## Verification

- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-tui
permissions_selection_history_snapshot_full_access_to_default --
--nocapture`
- `cargo test -p codex-tui
permissions_selection_sends_approvals_reviewer_in_override_turn_context
-- --nocapture`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_history_snapshot_full_access_to_default
--test_output=errors`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context
--test_output=errors`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734).
* #19737
* #19736
* #19735
* __->__ #19734
2026-04-26 20:31:23 -07:00
Abhinav
c3e60849e5 inline hostname resolution for remote sandbox config (#19739)
# Why

Requirements support host-specific
`remote_sandbox_config.hostname_patterns`, but config loading previously
resolved and passed the system hostname through every config-loading
path even when no requirements layer used `remote_sandbox_config`. On
machines where hostname lookup is slow, startup and app-server config
reads paid for a feature that was not active.

We only need the hostname when a requirements layer actually declares
`remote_sandbox_config`, so this moves hostname resolution to the single
requirements merge point and keeps all other config callers unaware of
hostname matching.

# What

- Removed the eager `host_name` plumbing from
`load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`,
app-server `ConfigManager`, network proxy loading, and related call
sites.
- Resolve the hostname inside
`merge_requirements_with_remote_sandbox_config` only when the incoming
requirements contain `remote_sandbox_config`.
2026-04-27 03:18:57 +00:00
Michael Bolin
ad57a3fee2 permissions: finish profile-backed app surfaces (#19395) 2026-04-26 19:42:39 -07:00
Andrey Mishchenko
1f304dd1f2 Allow agents.max_threads to work with multi_agent_v2 (#19733) 2026-04-26 17:56:05 -07:00
Michael Bolin
2cb8746457 permissions: remove core legacy policy round trips (#19394)
## Why

Several execution paths still converted profile-backed permissions into
`SandboxPolicy` and then rebuilt runtime permissions from that legacy
shape. Those round trips are unnecessary after the preceding PRs and can
lose split filesystem semantics. Core approval and escalation should
carry the resolved profile directly.

## What Changed

- Removes `sandbox_policy` from `ResolvedPermissionProfile`; the
resolved permission object now carries the canonical `PermissionProfile`
directly.
- Updates exec-policy fallback, shell/unified-exec interception,
escalation reruns, and related tests to pass profiles instead of legacy
policies.
- Removes legacy additional-permission merge helpers that built an
effective `SandboxPolicy` before rebuilding runtime permissions.
- Keeps legacy projections only at compatibility boundaries that still
require `SandboxPolicy`, not in core permission computation.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`







































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394).
* #19737
* #19736
* #19735
* #19734
* #19395
* __->__ #19394
2026-04-26 17:43:32 -07:00
Andrey Mishchenko
35bc6e3d01 Delete unused ResponseItem::Message.end_turn (#19605)
This field is unused. Delete it.
2026-04-26 17:18:09 -07:00
Ahmed Ibrahim
0bda8161a2 Split MCP connection modules (#19725)
## Why

The MCP connection manager module had grown to mix orchestration, RMCP
client startup, elicitation handling, Codex Apps cache and naming
behavior, tool qualification and filtering, and runtime data. The
previous stacked PRs split these responsibilities incrementally; this PR
collapses that work into one self-contained refactor on latest main.

## What changed

- Move McpConnectionManager into connection_manager.rs.
- Move RMCP client lifecycle, startup, and uncached tool listing into
rmcp_client.rs.
- Move elicitation request tracking and policy handling into
elicitation.rs.
- Move Codex Apps cache, key, filtering, and naming helpers into
codex_apps.rs.
- Rename the tool-name helper module to tools.rs and move ToolInfo, tool
filtering, schema masking, and qualification there.
- Move runtime and sandbox shared types into runtime.rs.
- Preserve latest main PermissionProfile-based MCP elicitation
auto-approval behavior.

## Verification

- just fmt
- cargo check -p codex-mcp
- cargo check -p codex-mcp --tests
- cargo check -p codex-core

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-26 23:23:34 +00:00
Michael Bolin
4c58e64f08 test: increase core-all-test shard count to 16 (#19727)
## Summary

Increase `core-all-test`'s Bazel shard count from `8` to `16`.

## Why

[#19609](https://github.com/openai/codex/pull/19609) restored
`bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s
shard count because the bigger timeout risk was not just a cold Windows
build. The more common problem was a long `rust_test()` shard failing
and getting retried multiple times.

Recent `main` runs show that `//codex-rs/core:core-all-test` still has
the same shape of problem on Windows:

- [Run
24943931330](https://github.com/openai/codex/actions/runs/24943931330)
reported `//codex-rs/core:core-all-test` as flaky after first-attempt
failures in shard `5/8` and shard `8/8`.
- Those retries were driven by
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
and
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`.
- The failed shard attempts in that run took `272.61s` and `259.27s`
before retrying, which is exactly the sort of wall-clock cost that burns
through the 30-minute budget.
- [Run
24966332583](https://github.com/openai/codex/actions/runs/24966332583)
also retried `//codex-rs/tui:tui-unit-tests` after
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
failed once on Windows.
- [Run
24965527138](https://github.com/openai/codex/actions/runs/24965527138)
and its linked [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce)
show the other half of the problem: when Windows cache reuse is weak,
the `bazel test //...` step can already consume `24m11s` on its own,
leaving very little headroom for flaky retries.

Increasing `core-all-test` to `16` shards does not fix the flaky tests,
but it does reduce the wall-clock cost when a single shard has to be
retried. That matches the mitigation we already applied to
`app-server-all-test` in `#19609`.

## What Changed

- Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards
instead of `8`.
- Leave `core-unit-tests` unchanged.

## Follow-up Work

This change is meant to buy back CI headroom while we fix the flaky
tests themselves in subsequent commits. The recent Windows retries that
look worth addressing directly include:

-
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
-
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`
-
`app::tests::update_memory_settings_updates_current_thread_memory_mode`

## Verification

- Compared `core-all-test`'s current sharding against the
`app-server-all-test` precedent in
[#19609](https://github.com/openai/codex/pull/19609).
- Inspected recent `main` Bazel workflow logs and the linked BuildBuddy
invocation to confirm that Windows retries on long shards are still
consuming a meaningful fraction of the 30-minute timeout budget.
- Did not run local tests for this change because it only adjusts Bazel
sharding metadata.
2026-04-26 23:10:26 +00:00
pakrym-oai
ba159cbc79 Fix codex-core config test type paths (#19726)
Summary:
- Update config tests to reference config requirement types from
codex_config after the loader split.

Tests:
- just fmt
- cargo build -p codex-core --tests
- cargo clippy -p codex-core --tests -- -D warnings
2026-04-26 15:58:17 -07:00
Michael Bolin
dda8199b73 permissions: migrate approval and sandbox consumers to profiles (#19393)
## Why

Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.

## What Changed

- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
2026-04-26 15:30:40 -07:00
pakrym-oai
9c3abcd46c [codex] Move config loading into codex-config (#19487)
## Why

Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.

To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.

This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.

## What Changed

- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`

## Notes

- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
2026-04-26 15:10:53 -07:00
pakrym-oai
2a020f1a0a Lift app-server JSON-RPC error handling to request boundary (#19484)
## Why

App-server request handling had a lot of repeated JSON-RPC error
construction and one-off `send_error`/`return` branches. This made small
handlers noisy and pushed error response details into leaf code that
otherwise only needed to validate input or call the underlying API.

## What Changed

- Added shared JSON-RPC error constructors in
`codex-rs/app-server/src/error_code.rs`.
- Lifted straightforward request result emission into
`codex-rs/app-server/src/message_processor.rs` so response/error
dispatch happens at the request boundary.
- Reused the result helpers across command exec, config, filesystem,
device-key, external-agent config, fs-watch, and outgoing-message paths.
- Removed leaf wrapper handlers where the method body was only
forwarding to a response helper.
- Returned request validation errors upward in the simple cases instead
of sending an error locally and immediately returning.

## Verification

- `cargo test -p codex-app-server --lib command_exec::tests`
- `cargo test -p codex-app-server --lib outgoing_message::tests`
- `cargo test -p codex-app-server --lib in_process::tests`
- `cargo test -p codex-app-server --test all v2::fs`
- `cargo test -p codex-app-server --test all v2::config_rpc`
- `cargo test -p codex-app-server --test all v2::external_agent_config`
- `cargo test -p codex-app-server --test all v2::initialize`
- `just fix -p codex-app-server`
- `git diff --check`

Note: full `cargo test -p codex-app-server` was attempted and stopped in
`message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
with a stack overflow after unrelated tests had already passed.
2026-04-26 15:10:35 -07:00
Michael Bolin
deaa307fb2 permissions: derive compatibility policies from profiles (#19392)
## Why

After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.

## What Changed

- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`



































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
2026-04-26 15:06:42 -07:00
Michael Bolin
4d7ce3447d permissions: make runtime config profile-backed (#19606)
## Why

This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.

`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.

The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.

## What Changed

- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
2026-04-26 13:29:54 -07:00
efrazer-oai
fed0a8f4fa feat: load AgentIdentity from JWT login/env (#18904)
## Summary

This PR lets programmatic AgentIdentity users provide one token through
either stdin login or environment auth.

`codex login --with-agent-identity` reads an Agent Identity JWT from
stdin, validates that it has the required claims, and stores that token
as the `agent_identity` value in `auth.json`. The file format is
token-only; the decoded account and key fields are runtime state, not
hand-authored auth.json fields.

The Agent Identity JWT claim shape and decoder live in
`codex-agent-identity`; `codex-login` only owns env/storage precedence
and conversion into `CodexAuth::AgentIdentity`.

When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
both env vars are set.

Reference old stack: https://github.com/openai/codex/pull/17387/changes
Reference JWT/env stack: https://github.com/openai/codex/pull/18176

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. This PR: accept AgentIdentity JWTs through login/env

## Testing

Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
formatter/linter cleanup, and CI.

---------

Co-authored-by: Shijie Rao <shijie.rao@openai.com>
2026-04-26 19:49:54 +00:00
Michael Bolin
ac2bffa443 test: harden app-server integration tests (#19683)
## Why

Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.

A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.

## What Changed

- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.

## Verification

- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
2026-04-26 12:43:16 -07:00
Thibault Sottiaux
87bc72408c [codex] remove responses command (#19640)
This removes the hidden `codex responses` CLI subcommand after
confirming no downstream callers rely on it, deleting the raw Responses
passthrough implementation, unregistering the subcommand, and dropping
the now-unused CLI dependencies on `codex-api` and
`codex-model-provider`.
2026-04-25 23:10:38 -07:00
Andrey Mishchenko
355c40ad7e Support end_turn in response.completed (#19610)
Some providers of Responses API forward a model-defined `end_turn`
boolean indicating explicitly the model's indication of whether it would
like to end the turn or to be inferenced again. In this PR, we update
the sampling loop to use this field correctly if it's set. If the field
is not set by the provider, we fall back to the existing sampling logic.
2026-04-25 21:57:42 -07:00
Felipe Coury
5591912f0b fix(tui): reflow scrollback on terminal resize (#18575)
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.

## Why

Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.

This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.

## Stack

- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support

## What Changed

- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.

## Runtime Bounds

Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.

## Validation

- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
2026-04-25 22:00:32 -03:00
Shijie Rao
4e30281a13 Guard npm update readiness (#19389)
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.

## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
2026-04-25 17:09:29 -07:00
Michael Bolin
9881dc7306 fix: restore 30-minute timeout for Bazel builds (#19609)
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
2026-04-25 16:34:06 -07:00
Michael Bolin
d54493ba1c test: stabilize app-server path assertions on Windows (#19604)
## Why

Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.

That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.

This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.

## Failure Modes Covered

- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.

## What Changed

- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.

## Behavior

This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.

## Verification

Targeted local checks run while extracting this fix:

- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`

Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.

## Docs

No docs update is needed because this is test-only infrastructure
stabilization.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
2026-04-25 16:25:28 -07:00
viyatb-oai
9aaa5d9358 [codex] Bypass managed network for escalated exec (#19595)
## Why

`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.

This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.

## Security impact

Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.

## What changed

- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.

## Verification

- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
2026-04-25 23:23:58 +00:00
Eric Traut
0c785598b3 Keep slash command popup columns stable while scrolling (#19511)
## Why

Fixes #19499.

The slash-command popup recalculated the command-name column from only
the rows visible in the current viewport. That made the description
column shift horizontally while scrolling through `/` commands whenever
longer command names entered or left the visible window.

## What Changed

`codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
selection-popup `AutoAllRows` column-width mode for both height
measurement and rendering. This keeps the command description column
based on the full filtered slash-command list instead of the current
viewport.

## Verification

- `cargo test -p codex-tui bottom_pane::command_popup`
2026-04-25 14:25:58 -07:00
Michael Bolin
f41306b4f3 test: isolate remote thread store regression from plugin warmups (#19593)
Follow-up to #19266.

## Why


`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:

```diff
 {
+    ".tmp",
     "config.toml",
     "installation_id",
     "memories",
     "skills",
 }
```

That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).

That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.

## What changed

- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced

## Verification

- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
2026-04-25 20:45:31 +00:00
Eric Traut
bce74c70ce Restore persisted model provider on thread resume (#19287)
Fixes #15219.

## Why

`thread/resume` should continue a persisted thread with the same model
provider that created the thread. The app server already restores the
persisted model and reasoning effort before resuming, but it was leaving
`model_provider` unset. If a user created a thread with one provider and
later switched their active profile to another provider, resumed
encrypted history could be sent to the wrong endpoint and fail with
`invalid_encrypted_content`.

The thread metadata already records the original provider, so resume
should apply it when the caller has not explicitly requested a different
model/provider/reasoning configuration.

## What changed

This updates `merge_persisted_resume_metadata` in
`app-server/src/codex_message_processor.rs` to copy
`ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
alongside the persisted model.

The existing resume metadata tests now also assert that:

- the persisted provider is restored for normal resume
- explicit model, provider, or reasoning-effort overrides still prevent
persisted resume metadata from being applied
- a thread with no persisted model or reasoning effort still resumes
with its persisted provider

## Verification

- `cargo test -p codex-app-server` passed the app-server unit tests,
including the updated resume metadata coverage. The broader integration
portion of that command failed in an unrelated environment-sensitive
skills-budget warning assertion, where this run saw 8 omitted skills
instead of the expected 7.
- `just fix -p codex-app-server` completed successfully.
2026-04-25 12:40:00 -07:00
Michael Bolin
88f300d74d fix: increase Bazel timeout to 45 minutes (#19578)
Unfortunately, if most of the build graph is invalidated such that there
are few cache hits, the Windows Bazel build for all the tests often
takes more than `30` minutes, so this PR increases the timeout to `45`
minutes until we set up distributed builds.
2026-04-25 10:03:01 -07:00
Ahmed Ibrahim
022f81df1f [codex] Order codex-mcp items by visibility (#19526)
## Why

The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
but several files still made reviewers read private support machinery
before the public or crate-facing entry points. This ordering pass makes
each file easier to scan: exported API first, crate-visible MCP
internals next, then private helpers in breadth-first order from the
higher-level MCP flows to leaf utilities.

## What Changed

- Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
auth, and helper surfaces are grouped by visibility and reader
importance.
- Moved public and crate-visible MCP items ahead of private helpers in
the auth, MCP planning/snapshot, connection manager, and tool-name
modules.
- Kept the change mechanical, with no behavior changes intended.

## Verification

- `cargo check -p codex-mcp`
2026-04-25 07:17:30 -07:00
Ahmed Ibrahim
706490ab1b [codex] Prune unused codex-mcp API and duplicate helpers (#19524)
## Why

`codex-mcp` currently exposes more API than the rest of the workspace
uses. Some of that surface is simply visibility that can be tightened,
and some of it is public helper code that remains compiler-valid because
it is exported even though no workspace caller uses it.

That distinction matters: Rust does not warn on exported API just
because the current workspace does not call it. This PR intentionally
treats those exported-but-workspace-unreferenced paths as stale
`codex-mcp` surface. The main example is MCP skill dependency
collection, where the active implementation now lives in
`codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
`codex-mcp` copy makes it unclear which implementation owns skill MCP
installation.

## What Changed

- Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
- Removed non-runtime helper methods from `McpConnectionManager` so it
stays focused on live MCP clients.
- Made `ToolPluginProvenance` lookup methods crate-private.
- Removed workspace-unreferenced snapshot wrapper APIs and
qualified-tool grouping helpers.
- Deleted the duplicate `codex-mcp` skill dependency module and tests
now that skill MCP dependency handling is owned by `core`.

## Verification

- `cargo check -p codex-mcp`
2026-04-25 06:36:07 -07:00
Matthew Zeng
6e838a19fa Enable unavailable dummy tools by default (#19459)
## Summary
- Mark `unavailable_dummy_tools` as a stable feature and enable it by
default
- Update the feature registry test to match the new default state

## Testing
- `just fmt`
- `cargo test -p codex-features`
2026-04-25 08:46:57 +00:00
Eric Traut
a2db6f97fb Fix codex-rs README grammar (#19514)
## Why

Issue #19418 points out a small grammar issue in `codex-rs/README.md`
under "Code Organization." The current sentence says "we hope this to
be," which reads awkwardly.

Fixes #19418.

## What changed

Updated the `core/` crate description so the sentence reads "we hope
this becomes a library crate."

## Verification

Documentation-only change. Reviewed the Markdown diff.
2026-04-24 23:31:47 -07:00
Dylan Hurd
f5497f4d65 Split approval matrix test groups (#19454)
## Why

Recent `main` CI repeatedly timed out in:

- `codex-core::all suite::approvals::approval_matrix_covers_all_modes`

It failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905823212](https://github.com/openai/codex/actions/runs/24905823212),
[24903439629](https://github.com/openai/codex/actions/runs/24903439629),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).

The failure pattern was a 60s Linux remote timeout. Logs showed many
approval scenarios completing before the single matrix test timed out.

## Root Cause

`approval_matrix_covers_all_modes` packed every approval/sandbox/tool
scenario into one test case. That made the test vulnerable to normal CI
variance: one slow scenario or a slow process startup could push the
whole monolithic case past the 60s per-test timeout. It also hid which
part of the matrix was slow because the runner only reported the one
large matrix test.

## What Changed

- Keep the shared `scenarios()` table as the single source of approval
matrix coverage.
- Use one `#[test_case]` per `ScenarioGroup` to generate five async
Tokio tests: danger/full-access, read-only, workspace-write,
apply-patch, and unified-exec.
- Keep the group runner small and add per-scenario error context so a
failure still reports the specific scenario name.

## Why This Should Be Reliable

Each scenario group now has its own test harness timeout instead of
sharing one timeout window with the full matrix. That removes the long
sequential loop from a single test while keeping the implementation
compact and easy to scan.

The tests still run through the same scenario definitions and runner, so
this preserves coverage. `test-case` already composes with
`#[tokio::test]` in this crate and is already available for test code.

## Verification

- `cargo test -p codex-core --test all approval_matrix_ -- --list`
- `cargo test -p codex-core --test all approval_matrix_`
2026-04-24 21:38:27 -07:00
Eric Traut
f1c963d77e Add goal TUI UX (5 / 5) (#18077)
Adds the TUI user experience for goals on top of the core runtime from
PR 4.

## Why

Users need a direct TUI control surface for long-running goals. The UI
should make the current goal visible, support common goal actions
without waiting for a model turn, and avoid confusing end-of-turn
notifications while an active goal is immediately continuing.

## What changed

- Added `/goal` summary rendering for the current goal, including
active, paused, budget-limited, and complete states.
- Added `/goal <objective>` creation/replacement through the app-server
goal API rather than a model prompt.
- Added `/goal clear`, `/goal pause`, and `/goal unpause` command
variants.
- Added a confirmation menu when the user enters a new goal while
another goal already exists.
- Updated `/goal` help and summary tip text so it reflects the supported
command variants without advertising slash-command token budgets.
- Added footer/statusline goal indicators, including elapsed time and
token budget display when a budget exists from API/tool-created goals.
- Consumes goal updated/cleared notifications so the TUI stays in sync
with external app-server changes.
- Suppresses end-of-turn desktop notifications only when a goal is still
active and follow-up work is expected.
- Preserves slash-command history behavior and avoids leaking queued
`/goal` state into unrelated submissions.

## Verification

- Added TUI unit and snapshot coverage for goal command availability,
summary rendering, control commands, replacement menu behavior,
status/footer display, notification handling, and command history.
2026-04-24 21:16:45 -07:00
Eric Traut
4167628622 Add goal core runtime (4 / 5) (#18076)
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.

## Why

A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.

## What changed

- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.

## Verification

- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
2026-04-24 21:16:00 -07:00
Eric Traut
32ace07ac5 Add goal model tools (3 / 5) (#18075)
Adds the model-facing goal tools on top of the app-server API from PR 2.

## Why

Once goals are persisted and exposed to clients, the model needs a
small, constrained tool surface for goal workflows. The tool contract
should let the model inspect goals, create them only when explicitly
requested, and mark them complete without giving it broad control over
user/runtime-owned state.

## What changed

- Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
the `goals` feature flag.
- Added core goal tool handlers that validate objectives and token
budgets before mutating persisted state.
- Constrained `create_goal` to create only when no goal exists, with
optional `token_budget` only when a budget is explicitly provided.
- Tightened the `create_goal` instructions so the model does not infer
goals from ordinary task requests.
- Constrained `update_goal` to expose only goal completion; pause,
resume, clear, and budget-limited transitions remain user- or
runtime-controlled.
- Registered the goal tools in the tool registry and kept them out of
review contexts where they should not appear.

## Verification

- Added tool-registry coverage for feature gating and tool availability.
- Added core session tests for create/get/update behavior, duplicate
goal rejection, budget validation, and completion-only updates.
2026-04-24 20:54:40 -07:00
Eric Traut
6c874f9b34 Add goal app-server API (2 / 5) (#18074)
Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.

## Why

Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.

## What changed

- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.

## Verification

- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.
2026-04-24 20:53:41 -07:00
Eric Traut
0ee737cea6 Add goal persistence foundation (1 / 5) (#18073)
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.

## Why

Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.

## What changed

- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.

## Verification

- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
2026-04-24 20:51:38 -07:00
Curtis 'Fjord' Hawthorne
8a559e7938 Remove js_repl feature (#19410) 2026-04-24 17:49:29 -07:00
Curtis 'Fjord' Hawthorne
cf02e9c052 Fix Bazel cargo_bin runfiles paths (#19468)
## Summary

Fix a Bazel-only path resolution bug in
`codex_utils_cargo_bin::cargo_bin`.

Under Bazel runfiles, `rlocation` can return a relative `bazel-out/...`
path even though `cargo_bin()` documents that it returns an absolute
path. That can break callers that store the returned binary path and
later spawn it after changing cwd, because the relative path is resolved
from the wrong directory.

This patch absolutizes the runfiles-resolved path before returning it.
2026-04-24 17:47:31 -07:00
viyatb-oai
1c3287125f ci: pin codex-action v1.7 (#19472)
## Summary
- update Codex issue automation to pin `openai/codex-action` to
`5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02`, the commit for `v1.7`
- keep the release intent visible with `# v1.7` comments beside the hash
pins

## Test plan
- `git diff --check`
- `yq e '.' .github/workflows/issue-labeler.yml`
- `yq e '.' .github/workflows/issue-deduplicator.yml`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-25 00:44:04 +00:00
Michael Bolin
789f387982 permissions: remove legacy read-only access modes (#19449)
## Why

`ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
`FullAccess` meant the historical read-only/workspace-write modes could
read the full filesystem, while `Restricted` tried to carry partial
readable roots. The partial-read model now belongs in
`FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
`SandboxPolicy` makes every legacy projection reintroduce lossy
read-root bookkeeping and creates unnecessary noise in the rest of the
permissions migration.

This PR makes the legacy policy model narrower and explicit:
`SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
the old full-read sandbox modes only. Split readable roots, deny-read
globs, and platform-default/minimal read behavior stay in the runtime
permissions model.

## What changed

- Removes `ReadOnlyAccess` from
`codex_protocol::protocol::SandboxPolicy`, including the generated
`access` and `readOnlyAccess` API fields.
- Updates legacy policy/profile conversions so restricted filesystem
reads are represented only by `FileSystemSandboxPolicy` /
`PermissionProfile` entries.
- Keeps app-server v2 compatible with legacy `fullAccess` read-access
payloads by accepting and ignoring that no-op shape, while rejecting
legacy `restricted` read-access payloads instead of silently widening
them to full-read legacy policies.
- Carries Windows sandbox platform-default read behavior with an
explicit override flag instead of depending on
`ReadOnlyAccess::Restricted`.
- Refreshes generated app-server schema/types and updates tests/docs for
the simplified legacy policy shape.

## Verification

- `cargo check -p codex-app-server-protocol --tests`
- `cargo check -p codex-windows-sandbox --tests`
- `cargo test -p codex-app-server-protocol sandbox_policy_`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19449
2026-04-24 17:16:58 -07:00
Celia Chen
d19de6d150 fix: Bedrock GPT-5.4 reasoning levels (#19461)
## Why

When using the Amazon Bedrock provider with `openai.gpt-5.4-cmb`, the
model picker allowed `xhigh` because the CMB catalog entry was derived
from the bundled `gpt-5.4` reasoning metadata. Bedrock rejects that
effort level, causing the request to fail before the turn can run:

```text
{"error":{"code":"validation_error","message":"Failed to deserialize the JSON body into the target type: Invalid 'reasoning': Invalid 'effort': unknown variant `xhigh`, expected one of `high`, `low`, `medium`, `minimal` at line 1 column 77239","param":null,"type":"invalid_request_error"}}
```

## What Changed

- Replace the runtime lookup of bundled `gpt-5.4` metadata for
`openai.gpt-5.4-cmb` with an explicit Bedrock CMB `ModelInfo` entry.
- Advertise only the Bedrock-supported CMB reasoning levels: `minimal`,
`low`, `medium`, and `high`.
- Keep the existing GPT OSS Bedrock model metadata and reasoning levels
unchanged.
- Add catalog coverage for the hardcoded CMB metadata and
Bedrock-compatible reasoning level list.
2026-04-25 00:05:22 +00:00
Rasmus Rygaard
5378cccd8a Refactor log DB into LogWriter interface (#19234)
## Why

This prepares feedback log capture for a future remote app-server hook
sink without changing the current local SQLite upload path. The
important boundary is now intentionally small: a log sink is a tracing
`Layer` that can also flush entries it has accepted.

That keeps the existing SQLite implementation simple while giving the
upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
different worker/write semantics, so this PR avoids introducing a shared
buffered-sink abstraction and instead lets each `LogWriter` own the
buffering mechanics it needs.

## What Changed

- Added `LogSinkQueueConfig` with the existing local defaults: queue
capacity `512`, batch size `128`, and flush interval `2s`.
- Added `LogDbLayer::start_with_config(...)` while preserving
`LogDbLayer::start(...)` and `log_db::start(...)` defaults.
- Introduced the `LogWriter` trait as the minimal shared interface:
`tracing_subscriber::Layer` plus `flush()`.
- Made `LogDbLayer` implement `LogWriter`.
- Kept tracing event formatting inside `LogDbLayer`; it still creates
one `LogEntry` per tracing event before queueing it for SQLite.
- Kept normal event capture best-effort and non-blocking via bounded
`try_send`.

## Behavior Notes

This does not change the SQLite schema, retention behavior,
`/feedback/upload`, or Sentry upload behavior. Normal log events still
drop when the queue is full; explicit `flush()` still waits for queue
capacity and receiver processing before returning.

## Verification

- `cargo test -p codex-state log_db`
- `cargo test -p codex-state`
- `just fix -p codex-state`

The added tests cover configured batch-size flushing, configured
interval flushing, queue-full drops, and the flush barrier semantics.
2026-04-24 16:27:39 -07:00
Dylan Hurd
32aad7bd13 Serialize legacy Windows PowerShell sandbox tests (#19453)
## Why

Recent `main` CI had repeated Windows timeouts in the legacy sandbox
process tests:

- `codex-windows-sandbox
session::tests::legacy_capture_powershell_emits_output` failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905411571](https://github.com/openai/codex/actions/runs/24905411571),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `legacy_tty_powershell_emits_output_and_accepts_input` failed in the
same set of runs.
- `legacy_non_tty_cmd_emits_output` failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24903336028](https://github.com/openai/codex/actions/runs/24903336028).
- `legacy_non_tty_powershell_emits_output` failed in runs
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24903336028](https://github.com/openai/codex/actions/runs/24903336028).

These failures were 30s timeouts on Windows x64 and/or arm64 rather than
assertion failures.

## Root Cause

The active legacy Windows sandbox process tests all exercise host-level
resources: sandbox setup, ACL/user state, private desktop process
launch, stdio capture, and PowerShell/cmd child cleanup. Running several
of these tests concurrently can leave them competing for the same
Windows sandbox setup path and process/session resources, which makes
command startup or output collection hang under CI load.

## What Changed

- Added a shared in-process mutex for the active legacy Windows sandbox
process tests.
- Held that guard across each legacy cmd/PowerShell process test so
those host-resource-heavy cases run one at a time.
- Kept the skipped legacy cmd TTY tests unchanged.

## Why This Should Be Reliable

The tests still use unique homes and run the real legacy sandbox process
path, but they no longer overlap the fragile host-level setup and
process/session lifecycle. Serializing just this small group removes the
concurrency race without reducing the behavioral coverage of each test.

## Verification

- `cargo test -p codex-windows-sandbox`
- GitHub Windows CI is the primary validation signal for the affected
tests; on this PR, Windows clippy, Windows release, and Windows local
Bazel passed after the serialization fix.
2026-04-24 16:18:30 -07:00
rreichel3-oai
219c65dc2f [codex] Forward Codex Apps tool call IDs to backend metadata (#19207)
## Summary
- include the outer tool `call_id` in Codex Apps MCP request metadata
under `_meta._codex_apps.call_id`
- preserve existing Codex Apps metadata like `resource_uri` and
`contains_mcp_source`
- add request metadata coverage for both the existing-metadata and
no-existing-metadata cases

## Why
The paired backend change in
[openai/openai#850796](https://github.com/openai/openai/pull/850796)
updates MCP compliance logging to prefer `_meta._codex_apps.call_id`
instead of the JSON-RPC request id. This client change sends that outer
tool call id so the backend can record the model/tool call identifier
when it is available.

This is wire-compatible with older backends because `_meta._codex_apps`
is already reserved backend-only metadata. Backends that do not read
`call_id` will ignore the extra field.

## Testing
- `cargo test -p codex-core request_meta`
- `just fmt`
- `just fix -p codex-core`
2026-04-24 18:49:34 -04:00
xl-openai
1e560f33e1 feat: Compress skill paths with root aliases (#19098)
Add skill root tracking so model-visible skill lists can use short path
aliases when absolute paths would exceed the metadata budget.
2026-04-24 15:49:07 -07:00
Tom
588f7a9fc4 [codex] add non-local thread store regression harness (#19266)
- Add an integration test that guarantees nothing gets written to codex
home dir or sqlite when running a rollout with a non-local ThreadStore
- Add an in-memory "spy" ThreadStore for tests like this

Note I could not find a good way to also ensure there were no filesystem
_reads_ that didn't go through threadstore. I explored a more elaborate
sandboxed-subprocess approach but it isn't platform portable and felt
like it wasn't (yet) worth it.
2026-04-24 15:45:44 -07:00
Konstantine Kahadze
3c6e2638ac Clarify bundled OpenAI Docs upgrade guide wording (#19422)
## Summary
- Mirrors the OpenAI Docs skill cleanup in the bundled Codex skill copy
- Clarifies reasoning-effort recommendation wording
- Replaces internal snake_case prompt block names with natural-language
guidance aligned to the prompting guide

## Test plan
- `git diff --check`
- Verified the old snake_case prompt block names no longer appear in the
bundled upgrade guide
2026-04-24 22:35:52 +00:00
Michael Bolin
9b8a1fbefc ci: publish codex-app-server release artifacts (#19447)
## Why
The VS Code extension and desktop app do not need the full TUI binary,
and `codex-app-server` is materially smaller than standalone `codex`. We
still want to publish it as an official release artifact, but building
it by tacking another `--bin` onto the existing release `cargo build`
invocations would lengthen those jobs.

This change keeps `codex-app-server` on its own release bundle so it can
build in parallel with the existing `codex` and helper bundles.

## What changed
- Made `.github/workflows/rust-release.yml` bundle-aware so each macOS
and Linux MUSL target now builds either the existing `primary` bundle
(`codex` and `codex-responses-api-proxy`) or a standalone `app-server`
bundle (`codex-app-server`).
- Preserved the historical artifact names for the primary macOS/Linux
bundles so `scripts/stage_npm_packages.py` and
`codex-cli/scripts/install_native_deps.py` continue to find release
assets under the paths they already expect, while giving the new
app-server artifacts distinct names.
- Added a matching `app-server` bundle to
`.github/workflows/rust-release-windows.yml`, and updated the final
Windows packaging job to download, sign, stage, and archive
`codex-app-server.exe` alongside the existing release binaries.
- Generalized the shared signing actions in
`.github/actions/linux-code-sign/action.yml`,
`.github/actions/macos-code-sign/action.yml`, and
`.github/actions/windows-code-sign/action.yml` so each workflow row
declares its binaries once and reuses that list for build, signing, and
staging.
- Added `codex-app-server` to `.github/dotslash-config.json` so releases
also publish a generated DotSlash manifest for the standalone app-server
binary.
- Kept the macOS DMG focused on the existing `primary` bundle;
`codex-app-server` ships as the regular standalone archives and DotSlash
manifest.

## Verification
- Parsed the modified workflow and action YAML files locally with
`python3` + `yaml.safe_load(...)`.
- Parsed `.github/dotslash-config.json` locally with `python3` +
`json.loads(...)`.
- Reviewed the resulting release matrices, artifact names, and packaging
paths to confirm that `codex-app-server` is built separately on macOS,
Linux MUSL, and Windows, while the existing npm staging and Windows
`codex` zip bundling contracts remain intact.
2026-04-24 15:29:37 -07:00
Ahmed Ibrahim
6de6eaa0c1 [4/4] Honor Streamable HTTP MCP placement (#18584) 2026-04-24 15:03:55 -07:00
Konstantine Kahadze
c43e2fcfbf Add gpt-image-2 to bundled OpenAI Docs skill (#19443)
## Summary
- Mirrors openai/skills#374 in the Codex bundled OpenAI Docs skill
- Adds `gpt-image-2` as the best image generation/edit model
- Updates `gpt-image-1.5` to less expensive image generation/edit
quality

## Test plan
- `git diff --check`
2026-04-24 21:48:45 +00:00
Michael Bolin
db94b1657b ci: stop publishing GNU Linux release artifacts (#19445)
## Why
We already prefer shipping the MUSL Linux builds, and the in-repo
release consumers resolve Linux release assets through the MUSL targets.
Keeping the GNU release jobs around adds release time and extra assets
without serving the paths we actually publish and consume.

This is also easier to reason about as a standalone change: future work
can point back to this PR as the intentional decision to stop publishing
`x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu` release
artifacts.

## What changed
- Removed the `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu`
entries from the `build` matrix in `.github/workflows/rust-release.yml`.
- Added a short comment in that matrix documenting that Linux release
artifacts intentionally ship MUSL-linked binaries.

## Verification
- Reviewed `.github/workflows/rust-release.yml` to confirm that the
release workflow now only builds Linux release artifacts for
`x86_64-unknown-linux-musl` and `aarch64-unknown-linux-musl`.
2026-04-24 21:29:45 +00:00
Tom
0a9b559c0b Migrate fork and resume reads to thread store (#18900)
- Route cold thread/resume and thread/fork source loading through
ThreadStore reads instead of direct rollout path operations
- Keep lookups that explicitly specify a rollout-path using the local
thread store methods but return an invalid-request error for remote
ThreadStore configurations
- Add some additional unit tests for code path coverage
2026-04-24 13:51:37 -07:00
Michael Bolin
13e0ec1614 permissions: make legacy profile conversion cwd-free (#19414)
## Why

The profile conversion path still required a `cwd` even when it was only
translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
made profile producers invent an ambient `cwd`, which is exactly the
anchoring we are trying to remove from permission-profile data. A legacy
workspace-write policy can be represented symbolically instead: `:cwd =
write` plus read-only `:project_roots` metadata subpaths.

This PR creates that cwd-free base so the rest of the stack can stop
threading cwd through profile construction. Callers that actually need a
concrete runtime filesystem policy for a specific cwd still have an
explicitly named cwd-bound conversion.

## What Changed

- `PermissionProfile::from_legacy_sandbox_policy` now takes only
`&SandboxPolicy`.
- `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
symbolic, cwd-free projection for profiles.
- The old concrete projection is retained as
`FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
runtime/boundary code that must materialize legacy cwd behavior.
- Workspace-write profiles preserve `CurrentWorkingDirectory` and
`ProjectRoots` special entries instead of materializing cwd into
absolute paths.

## Verification

- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-exec -p
codex-exec-server -p codex-tui -p codex-sandboxing -p
codex-linux-sandbox -p codex-analytics --tests`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
codex-sandboxing -p codex-linux-sandbox -p codex-analytics`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19414
2026-04-24 13:42:05 -07:00
canvrno-oai
7262c0c450 Skip disabled rows in selection menu numbering and default focus (#19170)
Selection menus in the TUI currently let disabled rows interfere with
numbering and default focus. This makes mixed menus harder to read and
can land selection on rows that are not actionable. This change updates
the shared selection-menu behavior in list_selection_view so disabled
rows are not selected when these views open, and prevents them from
being numbered like selectable rows.

- Disabled rows no longer receive numeric labels
- Digit shortcuts map to enabled rows only
- Default selection moves to the first enabled row in mixed menus
- Updated affected snapshot
- Added snapshot coverage for a plugin detail error popup
- Added a focused unit test for shared selection-view behavior

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-24 13:21:43 -07:00
willwang-openai
687c5d9081 Update unix socket transport to use WebSocket upgrade (#19244)
## Summary
- Switch Unix socket app-server connections to perform the standard
WebSocket HTTP Upgrade handshake
- Update the Unix socket test to exercise a real upgrade over the Unix
stream
- Refresh the app-server README to describe the new Unix socket behavior

## Testing
- `cargo test -p codex-app-server transport::unix_socket_tests`
- `just fmt`
- `git diff --check`
2026-04-24 13:06:51 -07:00
Ruslan Nigmatullin
a3cccbd8ed [codex] Omit fork turns from thread started notifications (#19093)
## Why

`thread/fork` responses intentionally include copied history so the
caller can render the fork immediately, but `thread/started` is a
lifecycle notification. The v2 `Thread` contract says notifications
should return `turns: []`, and the fork path was reusing the response
thread directly, causing copied turns to be emitted through
`thread/started` as well.

## What Changed

- Route app-server `thread/started` notification construction through a
helper that clears `thread.turns` before sending.
- Keep `thread/fork` responses unchanged so callers still receive copied
history.
- Add persistent and ephemeral fork coverage that asserts
`thread/started` emits an empty `turns` array while the response retains
fork history.

## Testing

- `just fmt`
- `cargo test -p codex-app-server`
2026-04-24 12:31:13 -07:00
Celia Chen
0db6811b7c Fix: use function apply_patch tool for Bedrock model (#19416)
## Why

`openai.gpt-5.4-cmb` is served through the Amazon Bedrock provider,
whose request validator currently accepts `function` and `mcp` tool
specs but rejects Responses `custom` tools. The CMB catalog entry reuses
the bundled `gpt-5.4` metadata, which marks `apply_patch_tool_type` as
`freeform`. That causes Codex to include an `apply_patch` tool with
`type: "custom"`, so even heavily disabled sessions can fail before the
model runs with:

```text
Invalid tools: unknown variant `custom`, expected `function` or `mcp`
```

This is provider-specific: the model should still expose `apply_patch`,
but for Bedrock it needs to use the JSON/function tool shape instead of
the freeform/custom shape.

## What Changed

- Override the `openai.gpt-5.4-cmb` static catalog entry to set
`apply_patch_tool_type` to `function` after inheriting the rest of the
`gpt-5.4` model metadata.
- Update the catalog test expectation so the CMB entry continues to
track `gpt-5.4` metadata except for this Bedrock-specific tool shape
override.

## Verification

- `cargo test -p codex-model-provider`
- `just fix -p codex-model-provider`
2026-04-24 18:45:09 +00:00
mcgrew-oai
dee5f5ea38 Harden package-manager install policy (#19163)
## Summary

This PR hardens package-manager usage across the repo to reduce
dependency supply-chain risk. It also removes the stale `codex-cli`
Docker path, which was already broken on `main`, instead of keeping a
bitrotted container workflow alive.

## What changed

- Updated pnpm package manager pins and workspace install settings.
- Removed stale `codex-cli` Docker assets instead of trying to keep a
broken local container path alive.
- Added uv settings and lockfiles for the Python SDK packages.
- Updated Python SDK setup docs to use `uv sync`.

## Why

This is primarily a security hardening change. It reduces
package-install and supply-chain risk by ensuring dependency installs go
through pinned package managers, committed lockfiles, release-age
settings, and reviewed build-script controls.

For `codex-cli`, the right follow-up was to remove the local Docker path
rather than keep patching it:

- `codex-cli/Dockerfile` installed `codex.tgz` with `npm install -g`,
which bypassed the repo lockfile and age-gated pnpm settings.
- The local `codex-cli/scripts/build_container.sh` helper was already
broken on `main`: it called `pnpm run build`, but
`codex-cli/package.json` does not define a `build` script.
- The container path itself had bitrotted enough that keeping it would
require extra packaging-specific behavior that was not otherwise needed
by the repo.

## Gaps addressed

- Global npm installs bypassed the repo lockfile in Docker and CLI
reinstall paths, including `codex-cli/Dockerfile` and
`codex-cli/bin/codex.js`.
- CI and Docker pnpm installs used `--frozen-lockfile`, but the repo was
missing stricter pnpm workspace settings for dependency build scripts.
- Python SDK projects had `pyproject.toml` metadata but no committed
`uv.lock` coverage or uv age/index settings in `sdk/python` and
`sdk/python-runtime`.
- The secure devcontainer install path used npm/global install behavior
without a local locked package-manager boundary.
- The local `codex-cli` Docker helper was already broken on `main`, so
this PR removes that stale Docker path instead of preserving a broken
surface.
- pnpm was already pinned, but not to the current repo-wide pnpm version
target.

## Verification

- `pnpm install --frozen-lockfile`
- `.devcontainer/codex-install`: `pnpm install --prod --frozen-lockfile`
- `.devcontainer/codex-install`: `./node_modules/.bin/codex --version`
- `sdk/python`: `uv lock --check`, `uv sync --locked --all-extras
--dry-run`, `uv build`
- `sdk/python-runtime`: `uv lock --check`, `uv sync --locked --dry-run`,
`uv build --wheel`
- `pnpm -r --filter ./sdk/typescript run build`
- `pnpm -r --filter ./sdk/typescript run lint`
- `pnpm -r --filter ./sdk/typescript run test`
- `node --check codex-cli/bin/codex.js`
- `docker build -f .devcontainer/Dockerfile.secure -t codex-secure-test
.`
- `cargo build -p codex-cli`
- repo-wide package-manager audit
2026-04-24 14:36:19 -04:00
Konstantine Kahadze
6bb2fa3fd4 Update bundled OpenAI Docs skill for GPT-5.5 (#19407)
## Summary
Updates the bundled OpenAI Docs system skill for GPT-5.5.

## Changes
- Updates the bundled latest-model fallback
- Replaces bundled upgrade guidance with GPT-5.5 migration guidance
- Replaces bundled prompting guidance with GPT-5.5 prompting guidance

## Test plan
- Ran `node scripts/resolve-latest-model-info.js`
- Verified bundled files match the OpenAI Docs skill fallback content
2026-04-24 18:26:47 +00:00
iceweasel-oai
e787358f70 check PID of named pipe consumer (#19283)
## Why
The elevated Windows command runner currently trusts the first process
that connects to its parent-created named pipes. Tightening the pipe ACL
already narrows who can reach that boundary, but verifying the connected
client PID gives the parent one more fail-closed check: it only accepts
the exact runner process it just spawned.

## What changed
- validate `GetNamedPipeClientProcessId` after `ConnectNamedPipe` and
reject clients whose PID does not match the spawned runner
- also did some code de-duplication to route the one-shot elevated
capture flow in `windows-sandbox-rs/src/elevated_impl.rs` through
`spawn_runner_transport()` so both elevated codepaths use the same pipe
bootstrap and PID validation

Using the transport unification here also reduces duplication in the
elevated Windows IPC bootstrap, so future hardening to the runner
handshake only needs to land in one place.

## Validation
- `cargo test -p codex-windows-sandbox`
- manual testing: one-shot elevated path via `target/debug/codex.exe
exec` running a randomized shell command and confirming captured output
- manual testing: elevated session path via `target/debug/codex.exe -c
'windows.sandbox="elevated"' sandbox windows -- python -u -c ...` with
stdin/stdout round-trips (`READY`, then `GOT:...` for two input lines)

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-04-24 17:41:08 +00:00
Alex Zamoshchin
bcc1caa920 respect workspace option for disabling plugins (#18907)
Respects the workspace setting for plugins in Codex

Plugins menu disappears
Plugins do not load
Plugins do not load in composer

no plugins loaded
<img width="809" height="226" alt="Screenshot 2026-04-23 at 3 20 45 PM"
src="https://github.com/user-attachments/assets/3a4dba8e-69c3-4046-a77e-f13ab77f84b4"
/>


no plugins in menu
<img width="293" height="204" alt="Screenshot 2026-04-23 at 3 20 35 PM"
src="https://github.com/user-attachments/assets/5cb9bf52-ad72-488f-b90c-5eb457da09a3"
/>
2026-04-24 17:38:45 +00:00
jif-oai
f802f0a391 chore: drop MCP Plugins and App from Morpheus (#19380)
Quick fix of https://github.com/openai/codex/issues/18333
2026-04-24 17:57:48 +02:00
danwang-oai
11806faf71 Fix hang on turn/interrupt (#18392)
Fix a bug where the `turn/interrupt` RPC hangs when interrupting a turn
that has already completed.

Before this change, `turn/interrupt` requests were queued in app-server
and only answered when a later TurnAborted event arrived. If the target
turn was already complete, core treated Op::Interrupt as a no-op, so no
abort event was emitted and the RPC could hang indefinitely.

This change fixes that in two places:

* Reject turn/interrupt immediately with `INVALID_REQUEST` when the
requested turn is no longer the active turn.
* Resolve any already-accepted pending interrupt requests when the turn
reaches TurnComplete, covering the case where a turn finishes naturally
after the interrupt request is accepted but before it aborts.

I tested this by adding a failing test in
707487c063. You may view the results here:
https://github.com/openai/codex/actions/runs/24585182419/

<img width="1512" height="310" alt="CleanShot 2026-04-17 at 16 33 30@2x"
src="https://github.com/user-attachments/assets/f4a88228-b2a4-41f4-9aaa-ec82814096af"
/>
2026-04-24 10:47:50 -04:00
jif-oai
28742866c7 Add agents.interrupt_message for interruption markers (#19351)
## Why

Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.

## What changed

- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.

## Testing

- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
2026-04-24 16:02:45 +02:00
jif-oai
deb4509302 feat: surface multi-agent thread limit in spawn description (#19360)
## Summary
- Thread `agent_max_threads` into `ToolsConfig` and
`SpawnAgentToolOptions`.
- Render the configured `max_concurrent_threads_per_session` value in
the MultiAgentV2 `spawn_agent` description.
- Cover the description text in `codex-tools` unit tests and
`codex-core` tool spec tests.

## Validation
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spawn_agent_description`
- `git diff --check`

## Notes
- `cargo test -p codex-core` was also attempted, but unrelated
environment-sensitive tests failed with the active local environment.
Examples: approvals reviewer defaults observed `AutoReview` instead of
`User`, request-permissions event tests did not emit events, and
proxy-env tests saw `http://127.0.0.1:50604` from the active proxy
environment.

Co-authored-by: Codex <noreply@openai.com>
2026-04-24 15:13:54 +02:00
jif-oai
9eadff9713 chore: alias max_concurrent_threads_per_session (#19354) 2026-04-24 14:33:03 +02:00
jif-oai
120aa07d81 Make MultiAgentV2 interruption markers assistant-authored (#19124)
## Why

`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.

This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.

## What changed

- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.

## Testing

- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
2026-04-24 13:39:26 +02:00
jif-oai
21463a5074 fix alpha build (#19350) 2026-04-24 13:36:05 +02:00
sayan-oai
c10f95ddac Update models.json and related fixtures (#19323)
Supersedes #18735.

The scheduled rust-release-prepare workflow force-pushed
`bot/update-models-json` back to the generated models.json-only diff,
which dropped the test and snapshot updates needed for CI.

This PR keeps the latest generated `models.json` from #18735 and adds
the corresponding fixture updates:
- preserve model availability NUX in the app-server model cache fixture
- update core/TUI expectations for the new `gpt-5.4` `xhigh` default
reasoning
- refresh affected TUI chatwidget snapshots for the `gpt-5.5`
default/model copy changes

Validation run locally while preparing the fix:
- `just fmt`
- `cargo test -p codex-app-server model_list`
- `cargo test -p codex-core includes_no_effort_in_request`
- `cargo test -p codex-core
includes_default_reasoning_effort_in_request_when_defined_by_model_info`
- `cargo test -p codex-tui --lib chatwidget::tests`
- `cargo insta pending-snapshots`

---------

Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
2026-04-24 11:14:13 +02:00
Eric Traut
ddfa691752 Surface reasoning tokens in exec JSON usage (#19308)
## Summary

Fixes #19022.

`codex exec --json` currently emits `turn.completed.usage` with input,
cached input, and output token counts, but drops the reasoning-token
split that Codex already receives through thread token usage updates.
Programmatic consumers that rely on the JSON stream, especially
ephemeral runs that do not write rollout files, need this field to
accurately display reasoning-model usage.

This PR adds `reasoning_output_tokens` to the public exec JSON `Usage`
payload and maps it from the existing `ThreadTokenUsageUpdated` total
token usage data.

## Verification

- Added coverage to
`event_processor_with_json_output::token_usage_update_is_emitted_on_turn_completion`
so `turn.completed.usage.reasoning_output_tokens` is asserted.
- Updated SDK expectations for `run()` and `runStreamed()` so TypeScript
consumers see the new usage field.
- Ran `cargo test -p codex-exec`.
- Ran `pnpm --filter ./sdk/typescript run build`.
- Ran `pnpm --filter ./sdk/typescript run lint`.
- Ran `pnpm --filter ./sdk/typescript exec jest --runInBand
--testTimeout=30000`.
2026-04-24 01:54:11 -07:00
Eric Traut
6f87eb0479 Hide unsupported MCP bearer_token from config schema (#19294)
## Summary

Fixes #19275.

Codex runtime rejects inline MCP `bearer_token` config entries and asks
users to configure `bearer_token_env_var` instead, but the generated
config schema still advertised `mcp_servers.<name>.bearer_token` as a
supported field. That made editor/schema validation disagree with
runtime validation.

This keeps `bearer_token` in `RawMcpServerConfig` so Codex can continue
producing the targeted runtime error for recent or existing configs, but
skips the field during schemars generation. The checked-in
`core/config.schema.json` fixture now exposes `bearer_token_env_var`
without exposing unsupported inline `bearer_token`.

## Verification

- Added `config_schema_hides_unsupported_inline_mcp_bearer_token` to
assert the generated schema hides `bearer_token` while preserving
`bearer_token_env_var`.
- Ran `cargo test -p codex-config`.
- Ran `cargo test -p codex-core config_schema`.
2026-04-24 00:17:43 -07:00
sayan-oai
e083b6c757 chore: apply truncation policy to unified_exec (#19247)
we were not respecting turn's `truncation_policy` to clamp output tokens
for `unified_exec` and `write_stdin`.

this meant truncation was only being applied by `ContextManager` before
the output was stored in-memory (so it _was_ being truncated from
model-visible context), but the full output was persisted to rollout on
disk.

now we respect that `truncation_policy` and `ContextManager`-level
truncation remains a backup.

### Tests
added tests, tested locally.
2026-04-24 00:17:39 -07:00
Eric Traut
ac8c9fc49c Reject unsupported js_repl image MIME types (#19292)
## Summary

`codex.emitImage` accepted arbitrary image MIME types for byte payloads
and data URLs. That allowed a value like `image/rgba` to be wrapped as
an `input_image`, even though it is not a supported encoded image
format, so the invalid image could reach the model-input path and
trigger output sanitization.

This results in a panic in debug builds because the output sanitization
is meant as a final safety net, not a primary means of rejecting invalid
image types. I've hit this case multiple times when executing certain
long-running tasks.

This PR rejects unsupported image MIME types before they are emitted
from `js_repl`.

## Changes

- Validate `codex.emitImage({ bytes, mimeType })` in the JS kernel so
only encoded PNG, JPEG, WebP, or GIF payloads are accepted.
- Apply the same MIME allowlist to direct image data URLs, including the
Rust host-side validation path.
- Clarify the JS REPL instructions so agents know byte payloads must
already be encoded as PNG/JPEG/WebP/GIF.
2026-04-24 00:14:51 -07:00
Michael Bolin
b68366718b ci: reuse Bazel CI startup for target-discovery queries (#19232)
## Why

A rerun of the Windows Bazel clippy job after
[#19161](https://github.com/openai/codex/pull/19161) had exactly the
cache behavior we wanted in BuildBuddy: zero action-cache misses. Even
so, the GitHub job still took a little over five minutes.

The problem was that the job was paying for two separate Bazel startup
paths:

1. a `bazel query` to discover extra lint targets
2. the real `bazel build --config=clippy ...` invocation

On Windows, that query was bypassing the CI Bazel wrapper, so it did not
reuse the same `--output_user_root`, CI config, or remote-cache setup as
the real build. In practice that meant the rerun could still cold-start
a separate Bazel server before the actual clippy build even began.

## What

- add `.github/scripts/run-bazel-query-ci.sh` to run CI-side Bazel
queries with the same startup and cache-related flags as the main Bazel
command
- switch `scripts/list-bazel-clippy-targets.sh` to use that helper for
manual `rust_test` target discovery
- switch `tools/argument-comment-lint/list-bazel-targets.sh` to use the
same helper
- simplify `.github/scripts/run-argument-comment-lint-bazel.sh` so its
Windows-only query path also goes through the shared helper

This keeps the target-discovery queries aligned with the later
build/test invocation instead of treating them as a separate cold Bazel
session.

## Verification

- `bash -n .github/scripts/run-bazel-query-ci.sh`
- `bash -n scripts/list-bazel-clippy-targets.sh`
- `bash -n tools/argument-comment-lint/list-bazel-targets.sh`
- `bash -n .github/scripts/run-argument-comment-lint-bazel.sh`
- mocked a Windows invocation of `run-bazel-query-ci.sh` and verified it
forwards `--output_user_root`, `--config=ci-windows`, the BuildBuddy
auth header, and the repository cache flags

## Docs

No documentation updates are needed.
2026-04-23 23:26:17 -07:00
Eric Traut
d87d918716 Resolve relative agent role config paths from layers (#19261)
Fixes #19257.

## Summary

Agent roles declared in config layers can set `config_file` to a
relative path, but deserializing the layer-local `[agents.*]` table
happened without an `AbsolutePathBuf` base path. That caused configs
like `config_file = "agents/my-role.toml"` to fail with `AbsolutePathBuf
deserialized without a base path`.

This updates agent role layer loading to deserialize `[agents.*]` while
the layer config folder is active as the path base, matching the
behavior documented for `AgentRoleToml.config_file`. It also adds
coverage for a user config layer with a relative agent role
`config_file`.
2026-04-23 23:23:11 -07:00
Michael Bolin
4816b89204 permissions: make profiles represent enforcement (#19231)
## Why

`PermissionProfile` is becoming the canonical permissions abstraction,
but the old shape only carried optional filesystem and network fields.
It could describe allowed access, but not who is responsible for
enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
when profiles were exported, cached, or round-tripped through app-server
APIs.

The important model change is that active permissions are now a disjoint
union over the enforcement mode. Conceptually:

```rust
pub enum PermissionProfile {
    Managed {
        file_system: FileSystemSandboxPolicy,
        network: NetworkSandboxPolicy,
    },
    Disabled,
    External {
        network: NetworkSandboxPolicy,
    },
}
```

This distinction matters because `Disabled` means Codex should apply no
outer sandbox at all, while `External` means filesystem isolation is
owned by an outside caller. Those are not equivalent to a broad managed
sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
inner sandbox may require the outer Codex layer to use no sandbox rather
than a permissive one.

## How Existing Modeling Maps

Legacy `SandboxPolicy` remains a boundary projection, but it now maps
into the higher-fidelity profile model:

- `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
with restricted filesystem entries plus the corresponding network
policy.
- `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
the “no outer sandbox” intent instead of treating it as a lax managed
sandbox.
- `ExternalSandbox { network_access }` maps to
`PermissionProfile::External { network }`, preserving external
filesystem enforcement while still carrying the active network policy.
- Split runtime policies that legacy `SandboxPolicy` cannot faithfully
express, such as managed unrestricted filesystem plus restricted
network, stay `Managed` instead of being collapsed into
`ExternalSandbox`.
- Per-command/session/turn grants remain partial overlays via
`AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
complete active runtime permissions.

## What Changed

- Change active `PermissionProfile` into a tagged union: `managed`,
`disabled`, and `external`.
- Keep partial permission grants separate with
`AdditionalPermissionProfile` for command/session/turn overlays.
- Represent managed filesystem permissions as either `restricted`
entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
present.
- Preserve old rollout compatibility by accepting the pre-tagged `{
network, file_system }` profile shape during deserialization.
- Preserve fidelity for important edge cases: `DangerFullAccess`
round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
and managed unrestricted filesystem + restricted network stays managed
instead of being mistaken for external enforcement.
- Preserve configured deny-read entries and bounded glob scan depth when
full profiles are projected back into runtime policies, including
unrestricted replacements that now become `:root = write` plus deny
entries.
- Regenerate the experimental app-server v2 JSON/TypeScript schema and
update the `command/exec` README example for the tagged
`permissionProfile` shape.

## Compatibility

Legacy `SandboxPolicy` remains available at config/API boundaries as the
compatibility projection. Existing rollout lines with the old
`PermissionProfile` shape continue to load. The app-server
`permissionProfile` field is experimental, so its v2 wire shape is
intentionally updated to match the higher-fidelity model.

## Verification

- `just write-app-server-schema`
- `cargo check --tests`
- `cargo test -p codex-protocol permission_profile`
- `cargo test -p codex-protocol
preserving_deny_entries_keeps_unrestricted_policy_enforceable`
- `cargo test -p codex-app-server-protocol
permission_profile_file_system_permissions`
- `cargo test -p codex-app-server-protocol serialize_client_response`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox`
- `just fix`
- `just fix -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
2026-04-23 23:02:18 -07:00
xli-oai
33cc135cc3 [codex] Support remote plugin install writes (#18917)
## Summary
- Add a remote plugin install write call that POSTs the selected remote
plugin to the ChatGPT cloud plugin API.
- Align remote install with the latest remote read contract:
`pluginName` carries the backend remote plugin id directly, for example
`plugins~Plugin_linear`, and install no longer synthesizes
`<name>@<marketplace>` ids.
- Validate remote install ids with the same character rules as remote
read, return the same install response shape as local installs, and
include mocked app-server coverage for the write path.

## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all plugin_install`
- `cargo test -p codex-core-plugins`
- `just fix -p codex-app-server`
- `just fix -p codex-core-plugins`
2026-04-23 22:10:15 -07:00
Ruslan Nigmatullin
19badb0be2 app-server: persist device key bindings in sqlite (#19206)
## Why

Device-key providers should only own platform key material. The
account/client binding used to authorize a signing payload is app-server
state, and keeping that state in provider-specific metadata makes the
same check harder to audit and harder to share across platform
implementations.

Persisting the binding in the shared state database gives the device-key
crate a platform-neutral source of truth before it asks a provider to
sign. It also lets app-server move potentially blocking key operations
off the main message processor path, which matters once providers may
wait for OS authentication prompts.

## What changed

- Add a `device_key_bindings` state migration plus `StateRuntime`
helpers keyed by `key_id`.
- Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
- Keep provider calls behind async store methods and run the synchronous
provider work through `spawn_blocking`.
- Wire app-server device-key RPC handling to the SQLite-backed binding
store and spawn response/error delivery tasks for device-key requests.
- Run the turn-start tracing test on the existing larger current-thread
test harness after the larger async surface made the default test stack
too small locally.

## Validation

- `cargo test -p codex-device-key`
- `cargo test -p codex-state device_key`
- `cargo test -p codex-state`
- `cargo test -p codex-app-server device_key`
- `cargo test -p codex-app-server
message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
- `cargo test -p codex-app-server`
- `just fix -p codex-device-key`
- `just fix -p codex-state`
- `just fix -p codex-app-server`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `git diff --check`
2026-04-23 21:55:56 -07:00
Celia Chen
e8d8080818 feat: let model providers own model discovery (#18950)
## Why

`codex-models-manager` had grown to own provider-specific concerns:
constructing OpenAI-compatible `/models` requests, resolving provider
auth, emitting request telemetry, and deciding how provider catalogs
should be sourced. That made the manager harder to reuse for providers
whose model catalog is not fetched from the OpenAI `/models` endpoint,
such as Amazon Bedrock.

This change moves provider-specific model discovery behind
provider-owned implementations, so the models manager can focus on
refresh policy, cache behavior, picker ordering, and model metadata
merging.

## What Changed

- Introduced a `ModelsManager` trait with separate `OpenAiModelsManager`
and `StaticModelsManager` implementations.
- Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives
outside `codex-models-manager`.
- Moved `/models` request construction, provider auth resolution,
timeout handling, and request telemetry into `codex-model-provider` via
`OpenAiModelsEndpoint`.
- Added provider-owned `models_manager(...)` construction so configured
OpenAI-compatible providers use `OpenAiModelsManager`, while
static/catalog-backed providers can return `StaticModelsManager`.
- Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock
model IDs.
- Updated core/session/thread manager code and tests to depend on
`Arc<dyn ModelsManager>`.
- Moved offline model test helpers into
`codex_models_manager::test_support`.
## Metadata References

The Bedrock catalog metadata is based on the official Amazon Bedrock
OpenAI model documentation:

- [Amazon Bedrock OpenAI
models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
lists the Bedrock model IDs, text input/output modalities, and `128,000`
token context window for `gpt-oss-20b` and `gpt-oss-120b`.
- [Amazon Bedrock `gpt-oss-120b` model
card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html)
lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the
`bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities,
and `128K` context window.
- [OpenAI `gpt-oss-120b` model
docs](https://developers.openai.com/api/docs/models/gpt-oss-120b)
document configurable reasoning effort with `low`, `medium`, and `high`,
plus text input/output modality.

The display names, default reasoning effort, and priority ordering are
Codex-local catalog choices.

## Test Plan
- Manually verified app-server model listing with an AWS profile:

```shell
CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \
  model-list
```

The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0`
as the default model and `openai.gpt-oss-20b-1:0` as the second listed
model, both text-only and supporting low/medium/high reasoning effort.
2026-04-24 04:28:25 +00:00
xl-openai
53be451673 feat: Use short SHA versions for curated plugin cache entries (#19095)
Curated plugin cache entries now use an 8-character SHA prefix, instead
of the full SHA, as the cache folder version number.
2026-04-23 21:15:03 -07:00
cassirer-openai
a9c111da54 [rollout_trace] Trace sessions and multi-agent edges (#18879)
## Summary

Adds the remaining session and multi-agent edge wiring needed to
reconstruct rollout relationships across spawned agents, resumed
sessions, and parent/child message delivery.

## Stack

This is PR 4/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This is the stack layer that makes traces useful for multi-threaded
agent workflows. The main invariant is that reconstructed relationships
should come from durable rollout data rather than transient in-memory
manager state wherever possible.

The PR is intentionally small relative to the preceding layers: it uses
the recorder and reducer contracts already established by the stack and
only adds the session/agent relationship events needed by later debug
reduction.
2026-04-24 02:29:45 +00:00
starr-openai
49fb25997f Add sticky environment API and thread state (#18897)
## Summary
- add sticky environment selections to app-server v2 thread/start and
turn/start request flow
- carry thread-level selections through core session/thread state
- add app-server coverage for sticky selections and turn overrides

## Stack
1. This PR: API and thread persistence
2. #18898: config.toml named environment loading
3. #18899: downstream tool/runtime consumers

## Validation
- Not run locally; split only.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 18:57:13 -07:00
cassirer-openai
e3c8720a99 [rollout_trace] Add debug trace reduction command (#18880)
## Summary

Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.

## Stack

This is PR 5/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.

The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.
2026-04-24 01:56:48 +00:00
Celia Chen
432771c5fd feat: expose AWS account state from account/read (#19048)
## Why

AWS/Bedrock mode currently reports `account: null` with
`requiresOpenaiAuth: false` from `account/read`. That suppresses the
OpenAI-auth requirement, but it does not let app clients distinguish AWS
auth from any other non-OpenAI custom provider. For the prototype AWS
provider UX, clients need a simple provider-derived signal so they can
suppress ChatGPT/API-key login and token-refresh paths without
hardcoding Bedrock checks.

## What changed

- Adds an `aws` variant to the v2 `Account` protocol union.
- Adds `ProviderAccountKind` to `codex-model-provider` so the runtime
provider owns the app-visible account classification.
- Makes Amazon Bedrock return `ProviderAccountKind::Aws` from the
model-provider layer.
- Updates app-server `account/read` to map `ProviderAccountKind` to the
existing `GetAccountResponse` wire shape.
- Preserves the existing `account: null, requiresOpenaiAuth: false`
behavior for other non-OpenAI providers.
- Regenerates the app-server protocol schema fixtures.
- Adds coverage for provider account classification and for the Amazon
Bedrock `account/read` response.

## Testing

- `cargo test -p codex-model-provider`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server get_account_with_aws_provider`

## Notes

I attempted `just bazel-lock-update` and `just bazel-lock-check`, but
both are blocked in my local environment because `bazel` is not
installed.
2026-04-24 01:53:13 +00:00
Eric Traut
72f757d144 Increase app-server WebSocket outbound buffer (#19246)
Fixes #18203.

## Why

Remote TUI clients connected through `codex app-server --listen
ws://...` can receive short bursts of outbound turn and tool-output
notifications. The WebSocket transport previously used the shared
128-message channel capacity for its outbound writer queue, so a healthy
client that briefly lagged during normal output streaming could fill the
queue and be disconnected immediately.

This is a smaller mitigation than #18265: instead of adding a new
overflow/backpressure pipeline, keep the existing non-blocking router
behavior and give WebSocket clients enough bounded headroom for
realistic bursts.

## What Changed

- Added a WebSocket-only outbound writer capacity of `64 * 1024`
messages.
- Used that larger capacity only for the WebSocket data writer queue in
`codex-rs/app-server/src/transport/websocket.rs`.
- Left the shared `CHANNEL_CAPACITY` and the existing disconnect-on-full
behavior unchanged for internal/control channels and genuinely stuck
clients.

## Verification

- `cargo test -p codex-app-server
transport::tests::broadcast_does_not_block_on_slow_connection`
- Manually retried the #18203 repro prompt against the remote TUI and
confirmed it stayed connected.
2026-04-23 18:47:28 -07:00
efrazer-oai
5882f3f95e refactor: route Codex auth through AuthProvider (#18811)
## Summary

This PR moves Codex backend request authentication from direct
bearer-token handling to `AuthProvider`.

The new `codex-auth-provider` crate defines the shared request-auth
trait. `CodexAuth::provider()` returns a provider that can apply all
headers needed for the selected auth mode.

This lets ChatGPT token auth and AgentIdentity auth share the same
callsite path:
- ChatGPT token auth applies bearer auth plus account/FedRAMP headers
where needed.
- AgentIdentity auth applies AgentAssertion plus account/FedRAMP headers
where needed.

Reference old stack: https://github.com/openai/codex/pull/17387/changes

## Callsite Migration

| Area | Change |
| --- | --- |
| backend-client | accepts an `AuthProvider` instead of a raw
token/header |
| chatgpt client/connectors | applies auth through
`CodexAuth::provider()` |
| cloud tasks | keeps Codex-backend gating, applies auth through
provider |
| cloud requirements | uses Codex-backend auth checks and provider
headers |
| app-server remote control | applies provider headers for backend calls
|
| MCP Apps/connectors | gates on `uses_codex_backend()` and keys caches
from generic account getters |
| model refresh | treats AgentIdentity as Codex-backend auth |
| OpenAI file upload path | rejects non-Codex-backend auth before
applying headers |
| core client setup | keeps model-provider auth flow and allows
AgentIdentity through provider-backed OpenAI auth |

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. This PR: migrate Codex backend auth callsites through AuthProvider
5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
and load `CODEX_AGENT_IDENTITY`

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-23 17:14:02 -07:00
Michael Bolin
a9f75e5cda ci: derive cache-stable Windows Bazel PATH (#19161)
## Why

The BuildBuddy runs for PR #19086 and the later `main` build had the
same source tree, but their Windows Bazel action and test cache keys did
not line up. Comparing the downloaded execution logs showed the full
GitHub-hosted Windows runner `PATH` had changed from
`apache-maven-3.9.14` to `apache-maven-3.9.15`.

This repo is not using Maven; the Maven entry was just ambient
hosted-runner state. The problem was that Windows Bazel CI was still
forwarding the whole runner `PATH` into Bazel via `--action_env=PATH`,
`--host_action_env=PATH`, and `--test_env=PATH`, which made otherwise
reusable cache entries sensitive to unrelated runner image churn.

After discussion with the Bazel and BuildBuddy folks, the better shape
for this change was to stop asking Bazel to inherit the ambient Windows
`PATH` and instead compute one explicit cache-stable `PATH` in the
Windows setup action that already prepares the CI toolchain environment.

## What

- remove Windows `PATH` passthrough from `.bazelrc`
- export `CODEX_BAZEL_WINDOWS_PATH` from
`.github/actions/setup-bazel-ci/action.yml`
- move the PATH derivation logic into
`.github/scripts/compute-bazel-windows-path.ps1` so the allow-list is
easier to review and document
- keep only the Windows tool locations these Bazel jobs actually need:
MSVC and SDK paths, Git, PowerShell, Node, DotSlash, and the standard
Windows system directories
- update `.github/scripts/run-bazel-ci.sh` to require that explicit
value and forward it to Bazel action, host action, and test environments
- log the derived `CODEX_BAZEL_WINDOWS_PATH` in the setup step to
simplify cache-key debugging

## Verification

- `bash -n .github/scripts/run-bazel-ci.sh`
- `ruby -e 'require "yaml"; YAML.load_file(ARGV[0])'
.github/actions/setup-bazel-ci/action.yml`
- PowerShell parse check for
`.github/scripts/compute-bazel-windows-path.ps1`
- simulated a representative Windows `PATH` in PowerShell; the
allow-list retained MSVC, Git, PowerShell, Node, Windows, and DotSlash
entries while dropping Maven
2026-04-23 22:28:00 +00:00
iceweasel-oai
867820ac7e do not attempt ACLs on installed codex dir (#19214)
We used to attempt a read-ACL on the same dir as `codex.exe` to grant
the sandbox user the ability to invoke `codex-command-runner.exe`. That
worked for the CLI case but it always fails for the installed desktop
app.

We have another solution already in place that copies
`codex-command-runner.exe` to `CODEX_HOME/.sandbox-bin` so we don't even
need this anymore. It causes a scary looking error in the logs that is a
non-issue and is therefore confusing
2026-04-23 22:21:48 +00:00
iceweasel-oai
2e228969be guide Windows to use -WindowStyle Hidden for Start-Process calls (#19044)
Sometimes codex runs `Start-Process` to start up a service or something
similar, which launches a user-visible powershell window that probably
doesn't get cleaned up. This instruction change encourages it to do so
using a hidden window.

This was reported in
https://openai.slack.com/archives/C09K6H5DZC4/p1776741272870519

One caveat is that this change won't do anything to cleanup these
processes, but it will stop them from polluting the user's visible
workspace

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 21:36:17 +00:00
Michael Bolin
040976b218 tests: isolate approval fixtures from host rules (#18288)
## Why

Several approval-focused tests were unintentionally sensitive to
host-level rule files. On machines with broader allowed command
prefixes, commonly allowed commands such as `/bin/date` could bypass the
approval path these tests were meant to exercise, making the fixtures
depend on the developer or CI host configuration.

## What changed

- Pins the approval matrix fixture to the explicit user reviewer so it
does not inherit a host reviewer.
- Changes OTel approval fixtures to request `/usr/bin/touch
codex-otel-approval-test`, avoiding a command that may be pre-approved
by local rules.
- Clears the config layer stack for the permissions-message assertion
that needs to compare only the permissions text under test.

## Verification

- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all approval_matrix_covers_all_modes -- --nocapture`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all permissions_messages -- --nocapture`
2026-04-23 14:12:09 -07:00
Abhinav
dc5cf1ff78 Mark hooks schema fixtures as generated (#19194)
## Summary
- mark generated hooks schema fixture JSON as linguist-generated
- keep the app-server protocol generated schema marking unchanged

## Validation
- `git check-attr linguist-generated --
codex-rs/hooks/schema/generated/post-tool-use.command.output.schema.json`

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 14:11:16 -07:00
Eric Traut
a50cb205b7 Stabilize plugin MCP tools test (#19191)
## Summary

The plugin MCP tool-listing test could hide MCP startup failures by
polling `ListMcpTools` until its own 30s deadline. If the plugin MCP
server startup had already failed or timed out, the session-owned MCP
manager would keep returning an empty tool list, so CI only reported
`discovered tools: []` instead of the startup state that mattered.

This makes the test synchronize on `McpStartupComplete` for the sample
plugin MCP server before asserting listed tools, and gives the
Bazel-launched test server a larger startup window.

## Notes

Confidence is about 80%. The source path strongly supports the RCA: a
failed MCP startup is represented as an empty tool list through
`ListMcpTools`, so the old polling contract could not distinguish "not
ready yet" from "startup already failed." I could not retrieve the CI
execution-log artifact to confirm the exact hidden startup error, but
the observed Ubuntu Bazel failure matches this path: repeated
`ListMcpTools` responses with no tools until the test-local timeout
fired.

I think this is the right solution because it keeps plugin behavior
unchanged and fixes only the test contract. Future startup failures
should now report the `McpStartupComplete` failure/cancellation instead
of timing out on an empty tool snapshot.

This test was introduced in https://github.com/openai/codex/pull/12864.
2026-04-23 14:08:40 -07:00
Eric Traut
3f8c06e457 Fix /review interrupt and TUI exit wedges (#18921)
Addresses #11267

## Summary
`/review` can be interrupted while it is still spawning the review
sub-agent. That spawn path lives in `codex-core` and did not observe the
task cancellation token until after `Codex::spawn` returned, so an
interrupted review could keep building a child session and leave the TUI
in a wedged state.

The TUI exit path also waited indefinitely for app-server
`thread/unsubscribe`, which made Ctrl+C look broken if the app-server
was already stuck. This makes interactive delegate startup
cancellation-aware and bounds the TUI shutdown-first unsubscribe wait
with a short UI escape-hatch timeout.

## Testing
I reproed the hang using the steps in the bug report. Confirmed hang no
longer exists after fix.
2026-04-23 13:28:12 -07:00
Eric Traut
cccc1b618e Stabilize approvals popup disabled-row test (#19178)
## Summary

The Windows Bazel job has been failing in
`chatwidget::tests::permissions::approvals_popup_navigation_skips_disabled`
because the test assumed a fixed approvals popup row order and shortcut
for the disabled permissions option. The approvals popup can include
platform-specific rows, so those assumptions made the test brittle.

This updates the test to derive the disabled row shortcut from the
rendered popup and assert navigation continues to skip disabled rows
before checking that disabled numeric shortcuts do not close or accept
the popup.
2026-04-23 13:21:35 -07:00
iceweasel-oai
d169bb541e use a version-specific suffix for command runner binary in .sandbox-bin (#19180)
we copy `codex-command-runner.exe` into `CODEX_HOME/.sandbox-bin/` so
that it can be executed by the sandbox user.
We also detect if that version is stale and copy a new one in if so.

This can fail when you are running multiple versions of the app - the
file in `.sandbox-bin` can look stale because it comes from another app
build.

This change allows us to have multiple versions in there for different
CLI versions, and it fallsback to a `size+mtime` hash in the filename
for dev builds that don't report a real CLI version.
2026-04-23 13:16:26 -07:00
xli-oai
0d6a90cd6b Add app-server marketplace upgrade RPC (#19074)
## Summary
- add a v2 `marketplace/upgrade` app-server RPC that mirrors the
existing configured Git marketplace upgrade path
- expose typed request/response/error payloads and regenerate
JSON/TypeScript schema fixtures
- add app-server integration coverage for all, named, already
up-to-date, and invalid marketplace upgrade requests

## Tests
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server marketplace_upgrade`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fmt`
2026-04-23 13:00:46 -07:00
Michael Bolin
491a3058f6 fix(exec-server): retain output until streams close (#18946)
## Why

A Mac Bazel run hit a flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
where the read path observed process exit but lost the expected buffered
stdout (`first\nsecond\n`). See the [GitHub Actions
job](https://github.com/openai/codex/actions/runs/24758468552/job/72436716505)
and [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/37475a12-4ef2-45fb-ab8a-e49a2aba1d59).

The underlying race is that process exit is not the same thing as
stdout/stderr closure. If a child or grandchild inherits the pipe write
end, or a process duplicates it with `dup2`, the watched process can
exit while the stream is still open and more output can still arrive.
The exec-server was starting exited-process retention cleanup from the
exit event, so the process entry could be removed before the output
streams had actually closed.

While stress-testing the exec-server unit suite,
`server::handler::tests::long_poll_read_fails_after_session_resume`
exposed a separate test race: it started a short-lived command that
could exit and wake the pending long-poll read before the session-resume
assertion observed the resumed-session error. That test is intended to
cover resume eviction, not process-exit delivery, so this change keeps
the process alive and quiet while the second connection resumes the
session.

## What changed

- Keep exec-server process entries retained until stdout/stderr streams
close, then start the post-exit retention timer from the closed event.
- Wake long-poll readers when the closed event is emitted.
- Add focused `local_process` unit coverage that proves late output is
still retained after the short test retention interval has elapsed, and
that closed process entries are eventually evicted.
- Add a local and remote regression test where a parent exits while a
child keeps inherited stdout open. The child waits on an explicit
release file, so the test deterministically observes exit first,
releases the child, then requires a nonzero-wait read from the exit
sequence to receive the late output.
- In `codex-rs/exec-server/src/server/handler/tests.rs`, make
`long_poll_read_fails_after_session_resume` run a long-lived silent
command instead of a short command that prints and exits. This isolates
the test to session-resume behavior and prevents a normal process exit
from satisfying the pending long-poll read first.

## Testing

- `cargo test -p codex-exec-server
exec_process_retains_output_after_exit_until_streams_close`
- `cargo test -p codex-exec-server local_process::tests`
- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`
- `bazel test //codex-rs/exec-server:exec-server-unit-tests
//codex-rs/exec-server:exec-server-exec_process-test
//codex-rs/exec-server:exec-server-file_system-test
//codex-rs/exec-server:exec-server-http_client-test
//codex-rs/exec-server:exec-server-initialize-test
//codex-rs/exec-server:exec-server-process-test
//codex-rs/exec-server:exec-server-websocket-test`
- `bazel test --runs_per_test=25
//codex-rs/exec-server:exec-server-unit-tests`

## Documentation

No docs update needed; this is an internal exec-server correctness fix.
2026-04-23 19:49:58 +00:00
Michael Bolin
9c0eced391 shell-escalation: carry resolved permission profiles (#18287)
## Why

Shell escalation still has adapter code that expects a legacy sandbox
policy, but command approvals should carry the resolved
`PermissionProfile` so callers can reason about the granted permissions
canonically.

## What changed

This introduces profile-shaped resolved escalation permissions while
retaining the derived legacy sandbox policy for the Unix escalation
adapter. It updates approval types, the escalation server protocol, and
tests that inspect escalated command permissions.

## Verification

- `cargo test -p codex-core --test all handle_container_exec_ --
--nocapture`
- `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture`

























































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287).
* #18288
* __->__ #18287
2026-04-23 12:46:19 -07:00
cassirer-openai
6d09b6752d [rollout_trace] Trace tool and code-mode boundaries (#18878)
## Summary

Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.

## Stack

This is PR 3/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.

The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.
2026-04-23 12:22:11 -07:00
Michael Bolin
ff22982d75 mcp: include permission profiles in sandbox state (#18286)
## Why

MCP tool calls can receive a serialized `SandboxState` when a server
declares the sandbox-state capability. That state is one of the places
MCP runtimes learn what permissions Codex is operating under. As the
permissions migration makes `PermissionProfile` the canonical
representation, MCP consumers should be able to read that profile
directly instead of reconstructing permissions from the legacy
`SandboxPolicy`.

## What changed

- Adds optional `permissionProfile` to `codex_mcp::SandboxState`, while
keeping `sandboxPolicy` for existing MCP consumers.
- Populates `permissionProfile` from the current `TurnContext` when
serializing sandbox state for MCP tool calls.

## Verification

- Current GitHub Actions for this PR are passing.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18286).
* #18288
* #18287
* __->__ #18286
2026-04-23 12:21:26 -07:00
Michael Bolin
f90cc0ee64 tui: carry permission profiles on user turns (#18285)
## Why

Per-turn permission overrides should use the same canonical profile
abstraction as session configuration. That lets TUI submissions preserve
exact configured permissions without round-tripping through legacy
sandbox fields.

## What changed

This adds `permission_profile` to user-turn operations, threads it
through TUI/app-server submission paths, fills the new field in existing
test fixtures, and adds coverage that composer submission includes the
configured profile.

## Verification

- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`























































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285).
* #18288
* #18287
* #18286
* __->__ #18285
2026-04-23 11:54:17 -07:00
Rasmus Rygaard
f11583b8f6 Add remote thread config endpoint (#18908)
## Why

App-server needs a way to fetch thread-scoped config from the remote
thread config service when the user config opts into that behavior. This
mirrors the existing experimental remote thread store endpoint while
keeping local/noop behavior as the default.

Startup paths also need to avoid silently dropping the remote config
endpoint after the first config load. The stdio app-server path
discovers the endpoint from the initial config and installs the real
thread config loader for later config builds, while in-process clients
used by TUI/exec now select the same remote loader directly from their
provided config.

## What changed

- Added `experimental_thread_config_endpoint` to `ConfigToml`, `Config`,
and `core/config.schema.json`.
- Added config parsing coverage for the new setting.
- Updated app-server startup to select `RemoteThreadConfigLoader` from
the initially loaded config, falling back to `NoopThreadConfigLoader`
when unset.
- Let `ConfigManager` replace its thread config loader after startup
discovery so later config loads use the selected loader.
- Updated in-process app-server client startup to pass
`RemoteThreadConfigLoader` when its config has
`experimental_thread_config_endpoint` set.

## Verification

- Added `experimental_thread_config_endpoint_loads_from_config_toml`.
- Added
`runtime_start_args_use_remote_thread_config_loader_when_configured`.
- Ran `cargo check -p codex-app-server --lib`.
- Ran `cargo test -p codex-app-server-client`.
2026-04-23 11:46:06 -07:00
maja-openai
cff337e4e3 Use Auto-review wording for fallback rationale (#19168)
## Why

PR #18797 currently surfaces fallback rationale text that names Guardian
directly.

## What changed

- Updated the bare allow and bare deny fallback rationales in
`codex-rs/core/src/guardian/prompt.rs` from Guardian to Auto-review.
- Updated the existing bare allow parser test and added explicit bare
deny parser coverage.

## Verification

- `cargo test -p codex-core parse_guardian_assessment_treats_bare`
2026-04-23 11:42:43 -07:00
xl-openai
198eddd25d Move marketplace add/remove and startup sync out of core. (#19099)
Move more things to core-plugins.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 11:27:17 -07:00
Ruslan Nigmatullin
e9165b9f40 ci: add macOS keychain entitlements (#19167)
## Summary

- add macOS application and team identifiers to the release signing
entitlements
- add a Codex keychain access group for release-signed macOS binaries
- keep the existing JIT entitlement unchanged

## Why

Codex release binaries are signed with the OpenAI Developer ID team, but
the current entitlements plist only grants JIT. macOS Keychain and
Secure Enclave operations that create persistent keys can require the
process to carry an application identifier and keychain access group.
Adding these entitlements gives release-signed binaries a stable
Keychain namespace for Codex-owned device keys.

## Validation

- `plutil -lint
.github/actions/macos-code-sign/codex.entitlements.plist`
2026-04-23 11:20:58 -07:00
Ruslan Nigmatullin
8a0ab3fc13 app-server: add Unix socket transport (#18255)
## Summary
- add unix:// app-server transport backed by the shared codex-uds crate
- reuse the websocket connection loop for axum and tungstenite-backed
streams
- add codex app-server proxy to bridge stdio clients to the control
socket
- tolerate Windows UDS backends that report a missing rendezvous path as
connection refused before binding

## Tests
- cargo test -p codex-app-server
control_socket_acceptor_forwards_websocket_text_messages_and_pings
- cargo test -p codex-app-server
- just fmt
- just fix -p codex-app-server
- git -c core.fsmonitor=false diff --check
2026-04-23 11:09:25 -07:00
Eric Traut
c2423f42d1 Respect explicit untrusted project config (#18626)
## Why

Fixes #18475. A `-c` override such as `projects.<cwd>.trust_level =
"untrusted"` is meant to be a runtime config override, but app-server
thread startup treated any non-trusted project as eligible for automatic
trust persistence when a permissive sandbox/cwd was requested. That
meant an explicit `untrusted` session override could still cause
`config.toml` to be updated with `trusted`.

## What changed

The app-server auto-trust path now runs only when the active project
trust level is unknown. Explicit `trusted` and explicit `untrusted`
values are both respected, regardless of whether they came from
persisted config or session flags.

A focused `thread/start` test now covers the explicit `untrusted` case
with a permissive sandbox request.

## Verification

- `cargo test -p codex-app-server`
- `just fix -p codex-app-server`
2026-04-23 10:51:17 -07:00
Tom
f1061d9d07 [codex] Implement remote thread store methods (#19008) 2026-04-23 17:49:28 +00:00
Tom
f1923a38b1 [codex] Route live thread writes through ThreadStore (#18882)
Begin migrating the thread write codepaths to ThreadStore.

This starts using ThreadStore inside of core session code, not only in
the app server code.

Rework the interfaces around thread recording/persistence. We're left
with the following:

* `ThreadManager`: owns the process-level registry of loaded threads and
handles cross-thread orchestration: start, resume, fork, lookup, remove,
and route ops to running CodexThreads.
* `CodexThread`: represents one loaded/running thread from the outside.
It is the handle app-server and callers use to submit ops, inspect
session metadata, and shut the thread down.
* `LiveThread`: session-owned persistence lifecycle handle for one
active thread. Core session code uses it to append rollout items,
materialize lazy persistence, flush, shutdown, discard init-failed
writers, and load that thread’s persisted history.
* `ThreadStore`: storage backend abstraction. It answers “how are
threads persisted, read, listed, updated, archived?” Local and remote
implementations live behind this trait.
* `LocalThreadStore`: local ThreadStore implementation. It owns the
file/sqlite-specific details and keeps RolloutRecorder as a local
implementation detail.

This is a few too many Thread abstractions for my liking, but they do
all represent different concepts / needs / layers.

Migration note: in places where the core code explicitly requires a
path, rather than a thread ID, throw an error if we're running with a
remote store.

Cover the new local live-writer lifecycle with focused tests and
preserve app-server thread-start behavior, including ephemeral pathless
sessions.
2026-04-23 10:17:09 -07:00
David de Regt
3d3028a5a9 Add excludeTurns parameter to thread/resume and thread/fork (#19014)
For callers who expect to be paginating the results for the UI, they can
now call thread/resume or thread/fork with excludeturns:true so it will
not fetch any pages of turns, and instead only set up the subscription.
That call can be immediately followed by pagination requests to
thread/turns/list to fetch pages of turns according to the UI's current
interactions.
2026-04-23 10:07:59 -07:00
Rasmus Rygaard
0b4f694347 Add remote thread config loader protos (#18892)
## Why

Thread-scoped config needs a stable boundary between the app/session
owner and the config stack. Instead of having call sites manually copy
thread config fields into individual overrides, this adds the proto and
Rust plumbing needed for a `ThreadConfigLoader` implementation to return
typed sources that can be translated into ordinary config layer entries.

Keeping the remote payload typed also makes precedence easier to reason
about: session-owned thread config maps back to the existing session
config source, while user-owned thread config is represented separately
without introducing a new config-layer source until it has TOML-backed
fields.

## What changed

- Added the `codex.thread_config.v1` protobuf service and generated Rust
module for loading thread config sources.
- Added `RemoteThreadConfigLoader`, which calls the gRPC service, parses
`SessionThreadConfig` / `UserThreadConfig`, and validates provider
fields such as `wire_api`, auth timeout, and absolute auth cwd.
- Added proto generation tooling under
`config/scripts/generate-proto.sh` and
`config/examples/generate-proto.rs`.
- Added `ThreadConfigLoader::load_config_layers`, plus static/no-op
loader helpers, so tests and callers can use the same typed loader
interface while config-layer translation stays centralized.

## Verification

- `cargo test -p codex-config thread_config`
2026-04-23 10:06:05 -07:00
jif-oai
a2f868c9d6 feat: drop spawned-agent context instructions (#19127)
## Why

MultiAgentV2 children should not receive an extra model-visible
developer fragment just because they were spawned. The parent/configured
developer instructions should carry through normally, but the dedicated
`<spawned_agent_context>` block is no longer desired.

## What changed

- Removed the `SpawnAgentInstructions` context fragment and its
`<spawned_agent_context>` wrapper.
- Stopped appending spawned-agent instructions in
`codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`.
- Updated subagent notification coverage to assert inherited parent
developer instructions without expecting the spawned-agent wrapper.

## Verification

- `cargo test -p codex-core --test all
spawned_multi_agent_v2_child_inherits_parent_developer_context --
--nocapture`
- `cargo test -p codex-core --test all
skills_toggle_skips_instructions_for_parent_and_spawned_child --
--nocapture`
- `cargo test -p codex-core --test all subagent_notifications --
--nocapture`
2026-04-23 18:54:45 +02:00
xli-oai
e18bfeec91 [codex] Fix plugin marketplace help usage (#18710)
## Summary
- Updates generated CLI help for plugin marketplace commands to show the
full `codex plugin marketplace ...` namespace.
- Adds a regression test covering the marketplace command and its `add`,
`upgrade`, and `remove` help pages.

## Root Cause
The marketplace parser already lived under `codex plugin marketplace`,
but Clap generated usage text from the child parser's standalone command
name. That made help output show stale `codex marketplace ...`
instructions even though the top-level `codex marketplace` command no
longer parses.

## Validation
- `just fmt`
- `cargo test -p codex-cli`
- `./target/debug/codex plugin marketplace --help`
2026-04-23 09:48:37 -07:00
Michael Bolin
5c239ad748 tui: sync session permission profiles (#18284)
## Why

Once `SessionConfigured` carries the active `PermissionProfile`, the TUI
must treat that as authoritative session state. Otherwise the widget can
keep stale local permission details after a session is configured or
resumed.

The TUI also keeps a local `Config` copy used for later operations, so
session-sourced profiles and subsequent local sandbox changes need to
keep the derived split runtime permissions in sync. Because this PR may
land before the follow-up user-turn profile plumbing, embedded
app-server turns also need a standalone path for carrying local runtime
sandbox overrides.

## What changed

- Sync the chat widget runtime filesystem/network permissions from
`SessionConfigured.permission_profile`, with the legacy `sandbox_policy`
as the fallback.
- Recompute split runtime permissions whenever the TUI applies or
carries forward a local sandbox-policy override.
- Mark feature-driven Auto-review sandbox changes as runtime sandbox
overrides so the standalone embedded turn-start profile path is used
even without the follow-up user-turn profile PR.
- Send a turn-start `permissionProfile` for embedded,
non-ExternalSandbox turns when the TUI has a runtime sandbox override;
remote and ExternalSandbox turns keep using the legacy sandbox field.
- Extend coverage for profile sync, local sandbox changes,
ExternalSandbox fallback, feature-driven sandbox overrides, and
turn-start permission override selection.

## Verification

- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
turn_start_permission_overrides_send_profiles_only_for_embedded_runtime_overrides`
- `cargo test -p codex-tui permission_settings_sync`
- `cargo test -p codex-tui
session_configured_external_sandbox_keeps_external_runtime_policy`
- `cargo test -p codex-tui
session_configured_syncs_widget_config_permissions_and_cwd`
- `just fix -p codex-tui`



---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18284).
* #18288
* #18287
* #18286
* #18285
* __->__ #18284
2026-04-23 09:47:53 -07:00
Eric Traut
1fda843fbc Update safety check wording (#19149)
Updates wording of cyber safety check.
2026-04-23 08:53:25 -07:00
jif-oai
45e1742030 exec-server: wait for close after observed exit (#19130)
## Why

Windows CI can flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
after a process has exited but before both output streams have closed.
`exec/read` returned immediately whenever `exited` was true, so callers
that had already observed the exit event could spin instead of
long-polling for the later `closed` state.

## What Changed

- Keep returning immediately when a terminal exit event is newly
observable.
- Allow later reads, after the caller has advanced past that event, to
wait for `closed` or new output until `wait_ms` expires.

## Verification

- CI pending.
2026-04-23 16:50:17 +02:00
jif-oai
d3b044938d Reject agents.max_threads with multi_agent_v2 (#19129)
## Why

`multi_agent_v2` uses the v2 agent lifecycle, so accepting the legacy
`agents.max_threads` limit alongside it creates conflicting
configuration semantics. Config load should fail early with a clear
error instead of allowing both knobs to be set.

## What Changed

- During config load, detect when the effective `multi_agent_v2` feature
is enabled and `agents.max_threads` is explicitly set.
- Return an `InvalidInput` error: `agents.max_threads cannot be set when
multi_agent_v2 is enabled`.

## Verification

- `cargo test -p codex-core multi_agent_v2_rejects_agents_max_threads`
passed locally with a temporary focused test for this behavior.
- `cargo test -p codex-core` was also run; the new focused path passed,
but the crate suite has unrelated pre-existing failures in managed
config/proxy/request-permissions tests.
2026-04-23 13:31:54 +02:00
Won Park
17ae906048 Fix auto-review config compatibility across protocol and SDK (#19113)
## Why

This keeps the partial Guardian subagent -> Auto-review rename
forward-compatible across mixed Codex installations. Newer binaries need
to understand the new `auto_review` spelling, but they cannot write it
to shared `~/.codex/config.toml` yet because older CLI/app-server
bundles only know `user` and `guardian_subagent` and can fail during
config load before recovering.

The Python SDK had the opposite compatibility gap: app-server responses
can contain `approvalsReviewer: "auto_review"`, but the checked-in
generated SDK enum did not accept that value.

## What Changed

- Keep `ApprovalsReviewer::AutoReview` readable from both
`guardian_subagent` and `auto_review`, while serializing it as
`guardian_subagent` in both protocol crates.
- Update TUI Auto-review persistence tests so enabling Auto-review
writes `approvals_reviewer = "guardian_subagent"` while UI copy still
says Auto-review.
- Map managed/cloud `feature_requirements.auto_review` to the existing
`Feature::GuardianApproval` gate without adding a broad local
`[features].auto_review` key or changing config writes.
- Add `auto_review` to the Python SDK `ApprovalsReviewer` enum and cover
`ThreadResumeResponse` validation.

## Testing

- `cargo test -p codex-protocol approvals_reviewer`
- `cargo test -p codex-app-server-protocol approvals_reviewer`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_in_profile_sets_profile_auto_review_policy`
- `cargo test -p codex-core
feature_requirements_auto_review_disables_guardian_approval`
- `pytest
sdk/python/tests/test_client_rpc_methods.py::test_thread_resume_response_accepts_auto_review_reviewer`
- `git diff --check`
2026-04-23 03:12:56 -07:00
Abhinav
305825abd9 Support MCP tools in hooks (#18385)
## Summary

Lifecycle hooks currently treat `PreToolUse`, `PostToolUse`, and
`PermissionRequest` as Bash-only flows
- hook schema constrains `tool_name` to `Bash`
- hook input assumes a command-shaped `tool_input`
- core hook dispatch path passes only shell command strings

That means hooks cannot target MCP tools even though MCP tool names are
model-visible and stable

This change generalizes those hook paths so they can match and receive
payloads for MCP tools while preserving the existing Bash behavior.

## Reviewer Notes

I think these are the key files
- `codex-rs/core/src/tools/handlers/mcp.rs`
- `codex-rs/core/src/mcp_tool_call.rs`

Otherwise the changes across apply_patch, shell, and unified_exec are
mainly to rewire everything to be `tool_input` based instead of just
`command` so that it'll make sense for MCP tools.

## Changes

- Allow `PreToolUse`, `PostToolUse`, and `PermissionRequest` hook inputs
to carry arbitrary `tool_name` and `tool_input` values instead of
hard-coding `Bash` and command-only payloads.
- Add MCP hook payload support through `McpHandler`, using the
model-visible tool name from `ToolInvocation` and the raw MCP arguments
as `tool_input`.
- Include MCP tool responses in `PostToolUse` by serializing
`McpToolOutput` into the hook response payload.
- Run `PermissionRequest` hooks for MCP approval requests after
remembered approval checks and before falling back to user-facing MCP
elicitation.
- Preserve exact matching for literal hook matchers like `Bash` and
`mcp__memory__create_entities`, while keeping regex matcher support for
patterns like `mcp__memory__.*` and `mcp__.*__write.*`.

---------

Co-authored-by: Andrei Eternal <eternal@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-23 07:33:57 +00:00
Michael Bolin
8bc667b07b app-server: include filesystem entries in permission requests (#19086)
## Why

`item/permissions/requestApproval` sends a requested permission profile
to app-server clients. The core profile already stores filesystem
permissions as `entries`, but the v2 compatibility conversion used the
legacy `read`/`write` projection whenever possible and left `entries`
unset.

That made the request ambiguous for clients that consume the canonical
v2 shape: `permissions.fileSystem.entries` was missing even though
filesystem access was being requested. A client that rendered or echoed
grants from `entries` could treat the request as having no filesystem
permission entries, then return an empty or incomplete grant. The
app-server intersects responses with the original request, so omitted
filesystem permissions are denied.

## What Changed

- Populate `AdditionalFileSystemPermissions.entries` when converting
legacy read/write roots for request permission payloads, while
preserving `read` and `write` for compatibility.
- Mark `read` and `write` as transitional schema fields in the generated
app-server schema.
- Add regression coverage for the v2 conversion, the app-server
`item/permissions/requestApproval` round trip, and TUI app-server
approval conversion expectations.
- Refresh generated JSON and TypeScript schema fixtures.

## Verification

- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server request_permissions_round_trip`
- `cargo test -p codex-tui
converts_request_permissions_into_granted_permissions`
- `cargo test -p codex-tui
resolves_permissions_and_user_input_through_app_server_request_id`
2026-04-23 00:21:59 -07:00
Shijie Rao
993e3f407e Persist target default reasoning on model upgrade (#19085)
## Why

When the TUI upgrade flow moves a user to a newer model, the accepted
migration should also persist the target model's default reasoning
effort. That keeps the upgraded model and reasoning setting aligned
instead of carrying forward a stale previously saved effort from the old
model.

## What changed

- The accepted model migration path now updates in-memory config, TUI
state, and persisted model selection with the target preset's
`default_reasoning_effort`.
- The upgrade destructuring keeps `reasoning_effort_mapping` explicitly
unused because mappings are no longer consulted on accepted migrations.
- Added a catalog test that starts with a pre-existing saved reasoning
effort and verifies the accepted upgrade overwrites it with the target
model default and emits the expected persistence events.
- Rebasing onto current `main` also updates a TUI thread-session test
helper for the latest `permission_profile` field and
`ApprovalsReviewer::AutoReview` rename so CI compiles on the new base.

## Verification

- `cargo test -p codex-tui model_catalog`
- `cargo test -p codex-tui
permission_settings_sync_updates_active_snapshot_without_rewriting_side_thread`
2026-04-22 23:36:15 -07:00
Gav Verma
2ef2d675d6 Clarify cloud requirements error messages (#19078)
## Why
The current cloud-requirements failures say `workspace-managed config`,
which is ambiguous and can read like it refers to local managed config
such as `managed_config.toml`.

This code path only applies to cloud requirements, so the user-facing
message should name that source directly.

## What changed
- Updated the load failure in
[`codex-rs/cloud-requirements/src/lib.rs`](46e704d1f9/codex-rs/cloud-requirements/src/lib.rs)
to say `failed to load cloud requirements (workspace-managed policies)`.
- Updated the parse failure in the same file to use the same `cloud
requirements (workspace-managed policies)` terminology.
- Kept `workspace-managed` hyphenated because it is used as a compound
modifier.
- Updated the matching assertion in
[`codex-rs/app-server/src/codex_message_processor.rs`](46e704d1f9/codex-rs/app-server/src/codex_message_processor.rs).
- Reused `CLOUD_REQUIREMENTS_LOAD_FAILED_MESSAGE` in the
`codex-cloud-requirements` test where the test is asserting that
crate-local contract directly.

## Testing
`cargo test -p codex-cloud-requirements`
2026-04-22 23:07:08 -07:00
xl-openai
951be1a8a1 feat: Warn and continue on unknown feature requirements (#19038)
Requirements feature flags now fail open like config feature flags, but
with a startup warning.

<img width="443" height="68" alt="image"
src="https://github.com/user-attachments/assets/76767fa7-8ce8-4fc7-8a09-902fcdda6298"
/>
2026-04-22 22:50:44 -07:00
xl-openai
fb6308cf64 Use remote plugin IDs for detail reads and enlarge list pages (#19079)
1. For remote plugin use plugin id (plugin name) directly for read
plugin details;
2. Request up to 200 remote plugins per directory list page.
2026-04-22 22:50:20 -07:00
Leo Shimonaka
7730fb3ab8 Add computer_use feature requirement key (#19071)
## Summary
- add the `computer_use` requirements-only feature key
- include it in generated config schema output
- cover the new key in feature metadata tests

## Testing
- `cargo test -p codex-features`
- `just write-config-schema`
- `just fmt`
- `just fix -p codex-features`

cc @xl-openai

---------

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
2026-04-22 22:49:26 -07:00
Eric Traut
08b5e96678 TUI: preserve permission state after side conversations (#18924)
Addresses #18854

## Why

The `/permissions` selector updates the active TUI session state, but
the cached session snapshot used when replaying a thread could still
contain the old approval or sandbox settings. After opening and leaving
`/side`, the main thread replay could restore those stale settings into
the `ChatWidget`, so the UI and the next submitted turn could fall back
to the old permission mode.

## What

- Sync the active thread's cached `ThreadSessionState` whenever approval
policy, sandbox policy, or approval reviewer changes.

## Verification

Confirmed bug prior to fix and correct behavior after fix.
2026-04-22 22:40:35 -07:00
Abhinav
23afa173f4 Mark codex_hooks stable (#19012)
# Why

Hooks are ready to graduate to GA in the next release!

# What

- Moves `Feature::CodexHooks` into the stable feature group.
- Marks the `codex_hooks` feature spec as `Stage::Stable` and
default-enabled.
2026-04-23 05:34:05 +00:00
Michael Bolin
9d824cf4b4 app-server: accept command permission profiles (#18283)
## Why

`command/exec` is another app-server entry point that can run under
caller-provided permissions. It needs to accept `PermissionProfile`
directly so command execution is not left behind on `SandboxPolicy`
while thread APIs move forward.

Command-level profiles also need to preserve the semantics clients
expect from profile-relative paths. `:cwd` and cwd-relative deny globs
should be anchored to the resolved command cwd for a command-specific
profile, while configured deny-read restrictions such as `**/*.env =
none` still need to be enforced because they can come from config or
requirements rather than the command override itself.

## What Changed

This adds `permissionProfile` to `CommandExecParams`, rejects requests
that combine it with `sandboxPolicy`, and converts accepted profiles
into the runtime filesystem/network permissions used for command
execution.

When a command supplies a profile, the app-server resolves that profile
against the command cwd instead of the thread/server cwd. It also
preserves configured deny-read entries and `globScanMaxDepth` on the
effective filesystem policy so one-off command overrides cannot drop
those read protections. The PR also updates app-server docs/schema
fixtures and adds command-exec coverage for accepted, rejected,
cwd-scoped, and deny-read-preserving profile paths.

## Verification

- `cargo test -p codex-app-server
command_exec_permission_profile_cwd_uses_command_cwd`
- `cargo test -p codex-app-server
command_profile_preserves_configured_deny_read_restrictions`
- `cargo test -p codex-app-server
command_exec_accepts_permission_profile`
- `cargo test -p codex-app-server
command_exec_rejects_sandbox_policy_with_permission_profile`
- `just fix -p codex-app-server`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18283).
* #18288
* #18287
* #18286
* #18285
* #18284
* __->__ #18283
2026-04-22 22:33:16 -07:00
Eric Traut
bbff4ee61a Add safety check notification and error handling (#19055)
Adds a new app-server notification that fires when a user account has
been flagged for potential safety reasons.
2026-04-22 22:24:12 -07:00
Shijie Rao
02170996e6 Default Fast service tier for eligible ChatGPT plans (#19053)
## Why

Enterprise and business-like ChatGPT plans should get Codex's Fast
service tier by default when the user or caller has not made an explicit
service-tier choice. At the same time, callers need a durable way to
choose standard routing without adding a new persisted `standard`
service tier value. This keeps existing config compatibility while
letting core own the managed default policy.

## What changed

- Resolve the effective service tier in core at session creation:
explicit `fast` or `flex` wins, explicit null/clear or
`[notice].fast_default_opt_out = true` resolves to standard routing, and
otherwise eligible ChatGPT plans resolve to Fast when FastMode is
enabled.
- Add `[notice].fast_default_opt_out` as the persisted opt-out marker
for managed Fast defaults.
- Treat app-server/TUI `service_tier: null` as an explicit
standard/clear choice by preserving that intent through config loading.
- Update TUI rendering to use core's effective service tier for startup
and status surfaces while still keeping `config.service_tier` as the
explicit configured choice.
- Update `/fast off` to clear `service_tier`, persist the opt-out
marker, and send explicit standard for subsequent turns.

## Verification

- Added unit coverage for config override/notice handling, service-tier
resolution, runtime null clearing, and `/fast off` turn propagation.
- `cargo build -p codex-cli`

Full test suite was not run locally per author request.
2026-04-22 21:54:44 -07:00
Michael Bolin
082fc4f632 protocol: report session permission profiles (#18282)
## Why

Clients that observe `SessionConfigured` need the same canonical
permission view that app-server thread responses provide. Reporting the
profile in protocol events lets clients keep their local state
synchronized without reinterpreting legacy sandbox fields.

## What changed

This adds `permission_profile` to `SessionConfigured` and propagates it
through core, exec JSON output, MCP server messages, and TUI
history/widget handling.

## Verification

- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`















































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18282).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* __->__ #18282
2026-04-22 21:29:32 -07:00
Andrei Eternal
2b2de3f38b codex: support hooks in config.toml and requirements.toml (#18893)
## Summary

Support the existing hooks schema in inline TOML so hooks can be
configured from both `config.toml` and enterprise-managed
`requirements.toml` without requiring a separate `hooks.json` payload.

This gives enterprise admins a way to ship managed hook policy through
the existing requirements channel while still leaving script delivery to
MDM or other device-management tooling, and it keeps `hooks.json`
working unchanged for existing users.

This also lays the groundwork for follow-on managed filtering work such
as #15937, while continuing to respect project trust gating from #14718.
It does **not** implement `allow_managed_hooks_only` itself.

NOTE: yes, it's a bit unfortunate that the toml isn't formatted as
closely as normal to our default styling. This is because we're trying
to stay compatible with the spec for plugins/hooks that we'll need to
support & the main usecase here is embedding into requirements.toml

## What changed

- moved the shared hook serde model out of `codex-rs/hooks` into
`codex-rs/config` so the same schema can power `hooks.json`, inline
`config.toml` hooks, and managed `requirements.toml` hooks
- added `hooks` support to both `ConfigToml` and
`ConfigRequirementsToml`, including requirements-side `managed_dir` /
`windows_managed_dir`
- treated requirements-managed hooks as one constrained value via
`Constrained`, so managed hook policy is merged atomically and cannot
drift across requirement sources
- updated hook discovery to load requirements-managed hooks first, then
per-layer `hooks.json`, then per-layer inline TOML hooks, with a warning
when a single layer defines both representations
- threaded managed hook metadata through discovered handlers and exposed
requirements hooks in app-server responses, generated schemas, and
`/debug-config`
- added hook/config coverage in `codex-rs/config`, `codex-rs/hooks`,
`codex-rs/core/src/config_loader/tests.rs`, and
`codex-rs/core/tests/suite/hooks.rs`

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server config_api`

## Documentation

Companion updates are needed in the developers website repo for:

- the hooks guide
- the config reference, sample, basic, and advanced pages
- the enterprise managed configuration guide

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2026-04-22 21:20:09 -07:00
Michael Bolin
9955eacd22 tui: fix approvals popup disabled shortcut test (#19072)
## Why

This regressed in #19063, which made `GuardianApproval` stable and
enabled by default. That adds an enabled `Auto-review` row to the
permissions popup, but `approvals_popup_navigation_skips_disabled` still
assumed the disabled `Full Access` row lived behind a hard-coded numeric
shortcut, so the test started selecting a different row and closing the
popup instead of verifying disabled-row behavior.

## What

- disable `GuardianApproval` in
`approvals_popup_navigation_skips_disabled` so the popup layout matches
the scenario the test is exercising
- choose the hidden numeric shortcut for the disabled `Full Access` row
by platform (`2` on non-Windows, `3` on Windows where `Read Only` is
shown) before asserting that selecting the disabled row leaves the popup
open

## Testing

- `cargo test -p codex-tui --lib
chatwidget::tests::permissions::approvals_popup_navigation_skips_disabled
-- --exact --nocapture`
- `cargo test -p codex-tui --lib chatwidget::tests::permissions --
--nocapture`
- `cargo test -p codex-tui`
2026-04-22 21:01:26 -07:00
Michael Bolin
e8ba912fcc test: set Rust test thread stack size (#19067)
## Summary

Set `RUST_MIN_STACK=8388608` for Rust test entry points so
libtest-spawned test threads get an 8 MiB stack.

The Windows BuildBuddy failure on #18893 showed
`//codex-rs/tui:tui-unit-tests` exiting with a stack overflow in a
`#[tokio::test]` even though later test binaries in the shard printed
successful summaries. Default `#[tokio::test]` uses a current-thread
Tokio runtime, which means the async test body is driven on libtest's
std-spawned test thread. Increasing the test thread stack addresses that
failure mode directly.

To date, we have been fixing these stack-pressure problems with
localized future-size reductions, such as #13429, and by adding
`Box::pin()` in specific async wrapper chains. This gives us a baseline
test-runner stack size instead of continuing to patch individual tests
only after CI finds another large async future.

## What changed

- Added `common --test_env=RUST_MIN_STACK=8388608` in `.bazelrc` so
Bazel test actions receive the env var through Bazel's cache-keyed test
environment path.
- Set the same `RUST_MIN_STACK` value for Cargo/nextest CI entry points
and `just test`.
- Annotated the existing Windows Bazel linker stack reserve as 8 MiB so
it stays aligned with the libtest thread stack size.

## Testing

- `just --list`
- parsed `.github/workflows/rust-ci.yml` and
`.github/workflows/rust-ci-full.yml` with Ruby's YAML loader
- compared `bazel aquery` `TestRunner` action keys before/after explicit
`--test_env=RUST_MIN_STACK=...` and after moving the Bazel env to
`.bazelrc`
- `bazel test //codex-rs/tui:tui-unit-tests --test_output=errors`
- failed locally on the existing sandbox-specific status snapshot
permission mismatch, but loaded the Starlark changes and ran the TUI
test shards
2026-04-22 19:51:49 -07:00
Dylan Hurd
5e71da1424 feat(request-permissions) approve with strict review (#19050)
## Summary
Allow the user to approve a request_permissions_tool request with the
condition that all commands in the rest of the turn are reviewed by
guardian, regardless of sandbox status.

## Testing
- [x] Added unit tests
- [x] Ran locally
2026-04-23 01:56:32 +00:00
Dylan Hurd
c6ab601824 chore(auto-review) feature => stable (#19063)
## Summary
Turn on Auto Review

## Testing
- [x] Update unit tests
2026-04-22 18:51:39 -07:00
Matthew Zeng
8f0a92c1e5 Fix relative stdio MCP cwd fallback (#19031) 2026-04-22 17:52:17 -07:00
Michael Bolin
3cc3763e6c core: box multi-agent wrapper futures (#19059)
## Why

While debugging the Windows stack overflows we saw in
[#13429](https://github.com/openai/codex/pull/13429) and then again in
[#18893](https://github.com/openai/codex/pull/18893), I hit another
overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.

That test drives the legacy multi-agent spawn / close / resume path. The
behavior was fine, but several thin async wrappers were still inlining
much larger `AgentControl` futures into their callers, which was enough
to overflow the default Windows stack.

## What

- Box the thin `AgentControl` wrappers around `spawn_agent_internal`,
`resume_single_agent_from_rollout`, and `shutdown_agent_tree`.
- Box the corresponding legacy `multi_agents` handler calls in `spawn`,
`resume_agent`, and `close_agent`.
- Keep behavior unchanged while reducing future size on this call path
so the Windows test no longer overflows its stack.

## Testing

- `cargo test -p codex-core --lib
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed
-- --exact --nocapture`
- `cargo test -p codex-core` (this still hit unrelated local
integration-test failures because `codex.exe` / `test_stdio_server.exe`
were not present in this shell; the relevant unit tests passed)
2026-04-22 17:48:13 -07:00
Ahmed Ibrahim
0e78ce80ee [3/4] Add executor-backed RMCP HTTP client (#18583)
### Why
The RMCP layer needs a Streamable HTTP client that can talk either
directly over `reqwest` or through the executor HTTP runner without
duplicating MCP session logic higher in the stack. This PR adds that
client-side transport boundary so remote Streamable HTTP MCP can reuse
the same RMCP flow as the local path.

### What
- Add a shared `rmcp-client/src/streamable_http/` module with:
  - `transport_client.rs` for the local-or-remote transport enum
  - `local_client.rs` for the direct `reqwest` implementation
  - `remote_client.rs` for the executor-backed implementation
  - `common.rs` for the small shared Streamable HTTP helpers
- Teach `RmcpClient` to build Streamable HTTP transports in either local
or remote mode while keeping the existing OAuth ownership in RMCP.
- Translate remote POST, GET, and DELETE session operations into
executor `http/request` calls.
- Preserve RMCP session expiry handling and reconnect behavior for the
remote transport.
- Add remote transport coverage in
`rmcp-client/tests/streamable_http_remote.rs` and keep the shared test
support in `rmcp-client/tests/streamable_http_test_support.rs`.

### Verification
- `cargo check -p codex-rmcp-client`
- online CI

### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 17:38:04 -07:00
Won Park
83ec1eb5d6 Rename approvals reviewer variant to auto-review (#19056)
## Why

`approvals_reviewer` now uses `auto_review` as the canonical config/API
value after #18504, but the Rust enum variant and nearby helper/test
names still used `GuardianSubagent` / guardian approval wording. That
made follow-up code and reviews confusing even though the external value
had already moved to Auto-review.

## What changed

- Renamed `ApprovalsReviewer::GuardianSubagent` to
`ApprovalsReviewer::AutoReview`.
- Updated protocol, app-server, config, core, TUI, exec, and analytics
test callsites.
- Renamed nearby helper/test names from guardian approval wording to
Auto-review wording where they refer to the approvals reviewer mode.
- Preserved wire compatibility:
  - `auto_review` remains the canonical serialized value.
  - `guardian_subagent` remains accepted as a legacy alias.

This intentionally does not rename the `[features].guardian_approval`
key, `Feature::GuardianApproval`, `core/src/guardian`, analytics event
names, or app-server Guardian review event types.

## Verification

- `cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-config approvals_reviewer`
- `cargo test -p codex-tui update_feature_flags`
- `cargo test -p codex-core permissions_instructions`
- `cargo test -p codex-tui permissions_selection`
2026-04-22 17:22:35 -07:00
Andrei Eternal
eed0e07825 hooks: emit Bash PostToolUse when exec_command completes via write_stdin (#18888)
Fixes #16246.

## Why

`exec_command` already emits `PreToolUse`, but long-running unified exec
commands that finish on a later `write_stdin` poll could miss the
matching `PostToolUse`. That left the Bash hook lifecycle inconsistent,
broke expectations around `tool_use_id` and `tool_input.command`, and
meant `PostToolUse` block/replacement feedback could fail to replace the
final session output before it reached model context.

This keeps the fix scoped to the `exec_command` / `write_stdin`
lifecycle. Broader non-Bash hook expansion is still out of scope here
and remains tracked separately in #16732.

## What changed

- Compute and store `PostToolUsePayload` while handlers still have
access to their concrete output type, and carry `tool_use_id` through
that payload.
- Preserve the original hook-facing `exec_command` string through
unified exec state (`ExecCommandRequest`, `ProcessEntry`,
`PreparedProcessHandles`, and `ExecCommandToolOutput`) via
`hook_command`, and remove the now-unused `session_command` output
metadata.
- Emit exactly one Bash `PostToolUse` for long-running `exec_command`
sessions when a later `write_stdin` poll observes final completion,
using the original `exec_command` call id and hook-facing command.
- Keep one-shot `exec_command` behavior aligned with the same payload
construction, including interactive completions that return a final
result directly.
- Apply `PostToolUse` block/replacement feedback before the final
`write_stdin` completion output is sent back to the model.
- Keep `write_stdin` itself out of `PreToolUse` matching so it continues
to act as transport/polling for the original Bash tool call.
- Restore plain matcher behavior for tool-name matchers such as `Bash`
and `Edit|Write`, while still treating patterns with regex characters
(for example `mcp__.*`) as regexes.
- Add unit coverage for unified exec payload construction and parallel
session separation, plus a core integration regression that verifies a
blocked `PostToolUse` replaces the final `write_stdin` output in model
context.

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-core post_tool_use_payload`
- `cargo test -p codex-core
post_tool_use_blocks_when_exec_session_completes_via_write_stdin`
2026-04-22 17:14:22 -07:00
Michael Bolin
6ca038bbd1 rollout: persist turn permission profiles (#18281)
## Why

Resume and reconstruction need to preserve the permissions that were
active for each user turn. If rollouts only keep legacy sandbox fields,
replay cannot faithfully represent profile-shaped overrides introduced
earlier in the stack.

## What changed

This records `permission_profile` on user-turn rollout events,
reconstructs it through history/state extraction, and updates rollout
reconstruction and related fixtures to keep the field explicit.

## Verification

- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`











































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* __->__ #18281
2026-04-22 17:00:29 -07:00
Michael Bolin
bc083e4713 clients: send permission profiles to app-server (#18280)
## Why

After app-server can accept `PermissionProfile`, first-party clients
should stop preferring legacy sandbox fields when canonical permission
information is available. This keeps the migration moving without
removing legacy compatibility yet.

The client side still has mixed surfaces during the stack: embedded
thread start/resume/fork and exec initial turns can derive a profile
directly from local config, while TUI remote sessions and some
turn-start paths only have a legacy/server-context-safe sandbox
projection. Those paths keep sending legacy sandbox fields rather than
synthesizing or sending lossy/local-only profiles.

## What changed

- Sends `permissionProfile` from exec and embedded TUI thread
start/resume/fork requests when config has a representable profile.
- Keeps legacy sandbox fallback for external sandbox policies, TUI
remote thread lifecycle requests, and TUI turn-start requests that do
not yet carry the active profile.
- Sends the actual config-derived `permissionProfile` for exec initial
turns instead of rebuilding one from the legacy sandbox projection.
- Stores response `permissionProfile` as optional in TUI session state
so external sandbox responses and compatibility payloads preserve
`null`.
- Updates tests for request construction and response mapping.

## Verification

- `cargo check --tests -p codex-tui -p codex-exec`
- `cargo test -p codex-tui app_server_session -- --nocapture`
- `cargo test -p codex-exec thread_start_params -- --nocapture`
- `cargo test -p codex-tui
app_server_session::tests::thread_lifecycle_params -- --nocapture`
- `just fix -p codex-tui -p codex-exec`
- `just fix -p codex-tui`












---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18280).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* __->__ #18280
2026-04-22 16:34:13 -07:00
Michael Bolin
44dbd9e48a exec-server: require explicit filesystem sandbox cwd (#19046)
## Why

This is a cleanup PR for the `PermissionProfile` migration stack. #19016
fixed remote exec-server sandbox contexts so Docker-backed filesystem
requests use a request/container `cwd` instead of leaking the local test
runner `cwd`. That exposed the broader API problem:
`FileSystemSandboxContext::new(SandboxPolicy)` could still reconstruct
filesystem permissions by reading the exec-server process cwd with
`AbsolutePathBuf::current_dir()`.

That made `cwd`-dependent legacy entries, such as `:cwd`,
`:project_roots`, and relative deny globs, depend on ambient process
state instead of the request sandbox `cwd`. As later PRs make
`PermissionProfile` the primary permissions abstraction, sandbox
contexts should be explicit about whether they carry a request `cwd` or
are profile-only. Removing the implicit constructor prevents new call
sites from accidentally rebuilding permissions against the wrong `cwd`.

## What changed

- Removed `FileSystemSandboxContext::new(SandboxPolicy)`.
- Kept production callers on explicit constructors:
`from_legacy_sandbox_policy(..., cwd)`, `from_permission_profile(...)`,
and `from_permission_profile_with_cwd(...)`.
- Updated exec-server test helpers to construct `PermissionProfile`
values directly instead of routing through legacy `SandboxPolicy`
projections.
- Updated the environment regression test to use an explicit restricted
profile with no synthetic `cwd`.

## Verification

- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19046).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #19046
2026-04-22 23:05:12 +00:00
Won Park
46142c3cb0 Rebrand approvals reviewer config to auto-review (#18504)
### Why

Auto-review is the user-facing name for the approvals reviewer, but the
config/API value still exposed the old `guardian_subagent` name. That
made new configs and generated schemas point users at Guardian
terminology even though the intended product surface is Auto-review.

This PR updates the external `approvals_reviewer` value while preserving
compatibility for existing configs and clients.

### What changed

- Makes `auto_review` the canonical serialized value for
`approvals_reviewer`.
- Keeps `guardian_subagent` accepted as a legacy alias.
- Keeps `user` accepted and serialized as `user`.
- Updates generated config and app-server schemas so
`approvals_reviewer` includes:
  - `user`
  - `auto_review`
  - `guardian_subagent`
- Updates app-server README docs for the reviewer value.
- Updates analytics and config requirements tests for the canonical
auto_review value.


### Compatibility

Existing configs and API payloads using:

```toml
approvals_reviewer = "guardian_subagent"
```

continue to load and map to the Auto-review reviewer behavior. 

New serialization emits: 
```toml
approvals_reviewer = "auto_review" 
```

This PR intentionally does not rename the [features].guardian_approval
key or broad internal Guardian symbols. Those are split out for a
follow-up PR to keep this migration small and avoid touching large
TUI/internal surfaces.

**Verification**
cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
2026-04-22 15:45:35 -07:00
Konstantine Kahadze
0e25c5ff42 Update bundled OpenAI Docs skill freshness check (#19043)
## Summary

Sync the bundled `openai-docs` system skill with the already-merged
`openai/skills` update from https://github.com/openai/skills/pull/360.

Codex bundles system skills from `codex-rs/skills/src/assets/samples`,
so this PR copies the same GPT-5.4 OpenAI Docs skill update into the
Codex app/CLI bundle path.

## Changes

- Add the latest-model resolver script to the bundled `openai-docs`
skill.
- Route model upgrade and prompt-upgrade requests through remote
latest-model metadata when current guidance is needed.
- Rename bundled fallback references to `upgrade-guide.md` and
`prompting-guide.md`.
- Keep the bundled fallback guidance GPT-5.4-only.

## Validation

- Verified this bundled skill is byte-for-byte identical to
`openai/skills@origin/main` `skills/.system/openai-docs`.
- Ran the resolver locally and confirmed it returns `gpt-5.4` /
`gpt-5p4`.
2026-04-22 22:31:04 +00:00
khoi
568cdacc7e [Codex] Register browser requirements feature keys (#18956)
## Summary
- register `in_app_browser` and `browser_use` as stable feature keys
- allow requirements/MDM feature requirements to pin those desktop
browser controls
- add coverage for browser requirements being accepted by config loading

## Testing
- `cargo fmt --all` (`just fmt` unavailable locally; rustfmt warned
about nightly-only `imports_granularity` config)
- `cargo test -p codex-features`
- `cargo test -p codex-core browser_feature_requirements_are_valid`
- Tested manually by setting in `requirements.toml` and seeing after app
restart state to reflect the setting was correct (at the time hiding the
`Browser Use` setting when the enterprise setting was set to false
2026-04-22 15:27:15 -07:00
joeytrasatti-openai
ee70b365ab Overlay state DB git metadata for filtered thread lists (#19036)
## Summary
- Factor the state DB `ThreadMetadata` to rollout `ThreadItem` mapping
into a shared helper used by both DB pages and filesystem overlays
- Generalize filtered filesystem list overlays to fill missing thread
list metadata from the state-derived `ThreadItem`, while preserving
filesystem `path` and `thread_id`
- Add coverage for the merge behavior so existing filesystem values are
not overwritten and future `ThreadItem` fields require an explicit
decision

## Testing
- `just fmt` from `codex-rs`
- `git diff --check -- codex-rs/rollout/src/recorder.rs
codex-rs/rollout/src/recorder_tests.rs`
- Attempted `cargo test -p codex-rollout thread_item_metadata` from
`codex-rs`; blocked in dependency fetch/setup after updating crates.io
and git submodules `https://github.com/livekit/protocol` and
`https://chromium.googlesource.com/libyuv/libyuv`, so the focused tests
did not run
2026-04-22 14:59:20 -07:00
Michael Bolin
d3dd0d759b exec-server: expose arg0 alias root to fs sandbox (#19016)
## Why

The post-merge `rust-ci-full` run for #18999 still failed the Ubuntu
remote `suite::remote_env` sandboxed filesystem tests. That run checked
out merge commit `ddde50c611e4800cb805f243ed3c50bbafe7d011`, so the arg0
guard lifetime fix was present.

The Docker-backed failure had two remaining pieces:

- The sandboxed filesystem helper needs to execute Codex through the
`codex-linux-sandbox` arg0 alias path. The helper sandbox was only
granting read access to the real Codex executable parent, so the alias
parent also has to be visible inside the helper sandbox.
- The remote-env tests were building sandbox contexts with
`FileSystemSandboxContext::new()`, which captures the local test runner
cwd. In the Docker remote exec-server, that host checkout path does not
exist, so spawning the filesystem helper failed with `No such file or
directory` before the helper could process the request.

## What Changed

- Track all helper runtime read roots instead of a single root.
- Add both the real Codex executable parent and the
`codex-linux-sandbox` alias parent to sandbox readable roots.
- Avoid sending an unused local cwd in remote filesystem sandbox
contexts when the permission profile has no cwd-dependent entries.
- Build the Docker remote-env test sandbox contexts with a cwd path that
exists inside the container.
- Add unit coverage for the alias-parent root and remote sandbox cwd
handling.

## Verification

- `cargo test -p codex-exec-server`
- `cargo test -p codex-core
remote_test_env_sandboxed_read_allows_readable_root`
- `just fix -p codex-exec-server`
- `just fix -p codex-core`
2026-04-22 21:34:22 +00:00
Leo Shimonaka
16eeeb534a Fix MCP permission policy sync (#19033)
###### Why/Context/Summary

Repro: start a session outside Full Access, switch permissions to Full
Access, then submit a new turn that triggers MCP/CUA permission
handling.

The turn used the live Full Access `SessionConfiguration`, but the MCP
coordinator was still synced from the stale `original_config_do_not_use`
/ per-turn config copy. That left the coordinator with an old sandbox
policy, so empty MCP permission elicitations could be denied instead of
auto-accepted.

Fix: update/rebuild the MCP connection manager from the live
turn/session approval and sandbox policy fields.

###### Test plan

```sh
just fmt
cargo test -p codex-core --lib
cargo test -p codex-core --lib mcp_tool_call::tests
```
2026-04-22 14:30:29 -07:00
viyatb-oai
2d73bac45f feat: add guardian network approval trigger context (#18197)
## Summary

Give guardian network-access reviews the command context that triggered
a managed-network approval. The prompt JSON now includes the originating
tool call id, tool name, command argv, cwd, sandbox permissions,
additional permissions, justification, and tty state when a single
active tool call can be attributed.

The implementation keeps the trigger shape canonical by serializing
`GuardianNetworkAccessTrigger` directly and lets each runtime build that
trigger from its `ToolCtx`. Non-guardian approval prompts avoid cloning
the full trigger payload.

## UX changes

Guardian network-access reviews now include a `trigger` object that
explains what command caused the network approval. Instead of seeing
only the requested host, the guardian reviewer can also see the
originating tool call, argv, working directory, sandbox mode,
justification, and tty state.

Example payload the guardian reviewer can see:

```json
{
  "tool": "network_access",
  "target": "https://api.github.com:443",
  "host": "api.github.com",
  "protocol": "https",
  "port": 443,
  "trigger": {
    "callId": "call_abc123",
    "toolName": "shell",
    "command": ["gh", "api", "/repos/openai/codex/pulls/18197"],
    "cwd": "/workspace/codex",
    "sandboxPermissions": "require_escalated",
    "justification": "Fetch PR metadata from GitHub.",
    "tty": false
  }
}
```

The network review itself remains scoped to the network decision:
`target_item_id` stays `null`. `trigger.callId` is attribution context
only, so clients can still distinguish network reviews from
item-targeted command reviews.

## Verification

- Added coverage for serializing network trigger context in guardian
approval JSON.
- Added regression coverage that network guardian reviews do not reuse
`trigger.callId` as `target_item_id`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 14:00:53 -07:00
Ahmed Ibrahim
9360f267f3 [2/4] Implement executor HTTP request runner (#18582)
### Why
Remote streamable HTTP MCP needs the executor to perform ordinary HTTP
requests on the executor side. This keeps network placement aligned with
`experimental_environment = "remote"` without adding MCP-specific
executor APIs.

### What
- Add an executor-side `http/request` runner backed by `reqwest`.
- Validate request method and URL scheme, preserving the transport
boundary at plain HTTP.
- Return buffered responses for ordinary calls and emit ordered
`http/request/bodyDelta` notifications for streaming responses.
- Register the request handler in the exec-server router.
- Document the runner entrypoint, conversion helpers, body-stream
bridge, notification sender, timeout behavior, and new integration-test
helpers.
- Add exec-server integration tests with the existing websocket harness
and a local TCP HTTP peer for buffered and streamed responses, with
comments spelling out what each test proves and its
setup/exercise/assert phases.

### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage

### Verification
- `just fmt`
- `cargo check -p codex-exec-server -p codex-rmcp-client --tests`
- `cargo check -p codex-core --test all` compile-only
- `git diff --check`
- Online full CI is running from the `full-ci` branch, including the
remote Rust test job.

Co-authored-by: Codex <noreply@openai.com>

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 20:36:34 +00:00
Michael Bolin
18a26d7bbc app-server: accept permission profile overrides (#18279)
## Why

`PermissionProfile` is becoming the canonical permissions shape shared
by core and app-server. After app-server responses expose the active
profile, clients need to be able to send that same shape back when
starting, resuming, forking, or overriding a turn instead of translating
through the legacy `sandbox`/`sandboxPolicy` shorthands.

This still needs to preserve the existing requirements/platform
enforcement model. A profile-shaped request can be downgraded or
rejected by constraints, but the server should keep the user's
elevated-access intent for project trust decisions. Turn-level profile
overrides also need to retain existing read protections, including
deny-read entries and bounded glob-scan metadata, so a permission
override cannot accidentally drop configured protections such as
`**/*.env = deny`.

## What changed

- Adds optional `permissionProfile` request fields to `thread/start`,
`thread/resume`, `thread/fork`, and `turn/start`.
- Rejects ambiguous requests that specify both `permissionProfile` and
the legacy `sandbox`/`sandboxPolicy` fields, including running-thread
resume requests.
- Converts profile-shaped overrides into core runtime filesystem/network
permissions while continuing to derive the constrained legacy sandbox
projection used by existing execution paths.
- Preserves project-trust intent for profile overrides that are
equivalent to workspace-write or full-access sandbox requests.
- Preserves existing deny-read entries and `globScanMaxDepth` when
applying turn-level `permissionProfile` overrides.
- Updates app-server docs plus generated JSON/TypeScript schema fixtures
and regression coverage.

## Verification

- `cargo test -p codex-app-server-protocol schema_fixtures`
- `cargo test -p codex-core
session_configuration_apply_permission_profile_preserves_existing_deny_read_entries`







---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #18279
2026-04-22 13:34:33 -07:00
Dylan Hurd
ed4def8286 feat(auto-review) short-circuit (#18890)
## Summary
Short circuit the convo if auto-review hits too many denials

## Testing
- [x] Added unit tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 20:34:15 +00:00
xl-openai
b77791c228 feat: Fairly trim skill descriptions within context budget (#18925)
Preserve skill name/path entries whenever possible and trim descriptions
first, using round-robin character allocation so short descriptions do
not waste budget.
2026-04-22 12:33:29 -07:00
Michael Bolin
ddde50c611 arg0: keep dispatch aliases alive during async main (#18999)
## Why

The Ubuntu GNU remote Cargo run has been regularly failing sandboxed
`suite::remote_env` filesystem tests with `No such file or directory`,
while the same cases pass under Bazel. The Cargo remote-env setup starts
`target/debug/codex exec-server` inside Docker via
`scripts/test-remote-env.sh`. That CLI builds `codex-linux-sandbox` and
other arg0 helper aliases in a temporary directory, then passes those
alias paths into the exec-server runtime.

`arg0_dispatch_or_else` constructed `Arg0DispatchPaths` from that
temporary alias guard, but then awaited the async CLI entry point
without otherwise keeping the guard live. That allowed the guard to be
dropped while the exec-server was still running, removing the helper
alias directory. Later sandboxed filesystem calls tried to spawn the
now-deleted `codex-linux-sandbox` path and surfaced as `ENOENT`.

The relevant distinction I found is that `core/tests/common` stores the
result of `arg0_dispatch()` in a process-lifetime
`OnceLock<Option<Arg0PathEntryGuard>>` for test binaries. The Cargo
remote-env setup exercises a real `codex exec-server` process instead,
so it depends on the normal CLI lifetime behavior fixed here.

## What Changed

- Keep the arg0 tempdir guard alive until `main_fn(paths).await`
completes.
- Keep the helper on the real `arg0_dispatch()` shape, where alias setup
can fail and return `None` in production.
- Add a regression test that uses an explicit guard, yields once, and
verifies the generated helper alias path still exists while the async
entry point is running.

## Verification

- `cargo test -p codex-arg0`
- `just argument-comment-lint -p codex-arg0`
- `just fix -p codex-arg0`
2026-04-22 11:06:34 -07:00
Won Park
11e5af53c4 Add plumbing to approve stored Auto-Review denials (#18955)
## Summary

This adds the structural plumbing needed for an app-server client to
approve a previously denied Guardian review and carry that approval
context into the next model turn.

This PR does not add the actual `/auto-review-denials` tool 

## What Changed

- Added app-server v2 RPC `thread/approveGuardianDeniedAction`.
- Added generated JSON schema and TypeScript fixtures for
`ThreadApproveGuardianDeniedAction*`.
- Added core `Op::ApproveGuardianDeniedAction`.
- Added a core handler that validates the event is a denied Guardian
assessment and injects a developer message containing the stored denial
event JSON.
- Queues the approval context for the next turn if there is no active
turn yet.
- Added the TUI app-server bridge so `Op::ApproveGuardianDeniedAction {
event }` is routed to the app-server request.

## What This Does Not Do

- Does not add `/auto-review-denials`.
- Does not add chat widget recent-denial state.
- Does not add popup/list UI.
- Does not add a product-facing denial lookup/store.
- Does not change where Guardian denials are originally emitted or
persisted.

## Verification

- `cargo test -p codex-tui thread_approve_guardian_denied_action`
2026-04-22 10:38:19 -07:00
Dylan Hurd
78593d72ea feat(auto-review) policy config (#18959)
## Summary
Allow users to customize their own auto-review policy config.

## Testing
- [x] added config_tests
2026-04-22 10:33:02 -07:00
cassirer-openai
f67383bcba [rollout_trace] Record core session rollout traces (#18877)
## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
2026-04-22 17:00:48 +00:00
Eric Traut
79ea577156 TUI: Keep remote app-server events draining (#18932)
Addresses #18860

Problem: Remote app-server clients could stop draining websocket events
when their bounded local event channel filled, leaving clients stuck on
stale in-progress turns after a disconnect.

Solution: Use an unbounded local event channel for the remote client so
the websocket reader can keep forwarding disconnect and progress events
instead of blocking or dropping them.

Why this is reasonable: This does not make the remote websocket itself
unbounded. The changed queue lives inside the remote client, between the
task that reads the remote websocket and the API consumer in the same
client process. Once an event has been received from the remote server,
preserving it is preferable to blocking websocket reads or dropping
disconnect/lifecycle events; network-level backpressure still happens at
the websocket boundary if the remote side outpaces the client.
2026-04-22 09:29:34 -07:00
Steve Coffey
0127cef5db Stage publishable Python runtime wheels (#18865)
This is PR 2 of the Python SDK PyPI publishing split. [PR
1](https://github.com/openai/codex/pull/18862) refreshed the generated
SDK bindings; this PR makes the runtime package itself publishable, and
PR 3 will wire the SDK package/version pinning to this runtime package.

## Summary
- Rename the runtime distribution to `openai-codex-cli-bin` while
keeping the import package as `codex_cli_bin`.
- Make the runtime package wheel-only and build `py3-none-<platform>`
wheels instead of interpreter-specific wheels.
- Add `stage-runtime --codex-version` and `--platform-tag` so release
staging can produce the platform wheel matrix from Codex release tags.
- Add focused artifact workflow tests for version normalization,
platform tag injection, and runtime wheel metadata.

## Why Rename
There is already an unofficial PyPI package,
[`codex-bin`](https://pypi.org/project/codex-bin/), distributing OpenAI
Codex binaries. Publishing the official SDK runtime dependency as
`openai-codex-cli-bin` makes the ownership clear, avoids confusing the
SDK-pinned runtime wheel with that unowned wrapper, and keeps the import
package unchanged as `codex_cli_bin`.

## Tests
- `uv run --extra dev pytest
tests/test_artifact_workflow_and_binaries.py` -> 21 passed
- `uv run --extra dev python scripts/update_sdk_artifacts.py
stage-runtime /tmp/codex-python-pr2-rebased/runtime-stage
/tmp/codex-python-pr2-rebased/codex --codex-version
rust-v0.116.0-alpha.1 --platform-tag macosx_11_0_arm64`
- `uv run --with build --extra dev python -m build --wheel
/tmp/codex-python-pr2-rebased/runtime-stage`
- `uv run --with twine --extra dev twine check
/tmp/codex-python-pr2-rebased/runtime-stage/dist/openai_codex_cli_bin-0.116.0a1-py3-none-macosx_11_0_arm64.whl`

## Note
- Full `uv run --extra dev pytest` currently fails because regenerating
from schemas already on `main` adds new DeviceKey Python types. I left
that generated catch-up out of this runtime-only PR.
2026-04-22 08:14:48 -07:00
Vaibhav Srivastav
0ebe69a8c3 [codex] Update imagegen system skill (#18852)
## Summary

This updates the embedded `imagegen` system skill in `codex-rs/skills`
with the ImageGen 2 skill changes from `openai/skills-internal#87`.

The bundled skill now keeps normal image generation/editing on the
built-in `image_gen` path, updates the CLI fallback defaults to
`gpt-image-2`, and routes explicit transparent-output requests through
`gpt-image-1.5` with clear guidance that `gpt-image-2` does not support
transparent backgrounds.

## Details

- Update `SKILL.md` routing guidance for built-in vs CLI fallback
behavior.
- Update CLI/API references for `gpt-image-2` size constraints, quality
options, near-4K sizes, and unsupported options.
- Update `scripts/image_gen.py` defaults and validation:
  - default model `gpt-image-2`
  - default size `auto`
  - default quality `medium`
  - reject transparent backgrounds on `gpt-image-2`
  - reject `input_fidelity` on `gpt-image-2`
- validate flexible `gpt-image-2` sizes and suggest `3824x2160` /
`2160x3824` for near-4K requests
- Update prompt/reference docs with the new model and routing guidance.

## Validation

- `cargo test -p codex-skills`
- `git diff --check`
- Manual CLI dry-runs for:
  - default `gpt-image-2` payload
  - `3824x2160` near-4K size acceptance
  - `3840x2160` rejection with near-4K guidance
  - transparent background rejection on `gpt-image-2`
  - transparent background acceptance on `gpt-image-1.5`
  - `input_fidelity` rejection on `gpt-image-2`

Bazel target check was not run locally because `bazel` is not installed
in this environment.
2026-04-22 15:08:10 +00:00
jif-oai
65420737e8 chore: prep memories for AB (#18973) 2026-04-22 11:46:15 +01:00
jif-oai
ddf65c9647 fix: cargo deny (#18971) 2026-04-22 11:46:11 +01:00
jif-oai
639382609f fix: wait_agent timeout for queued mailbox mail (#18968)
## Why

`wait_agent` can be called while mailbox mail is already pending. The
previous implementation subscribed for future mailbox sequence changes
and then waited for the next notification. If the mail was queued before
that wait started, no new notification arrived, so the tool could sit
until `timeout_ms` even though mail was ready to deliver.

## What Changed

- Added `Session::has_pending_mailbox_items()` for checking pending
mailbox mail through the session API.
- Updated `multi_agents_v2::wait` to return immediately when pending
mailbox mail already exists before sleeping on a new mailbox sequence
update.
- Reworked the regression coverage in `multi_agents_tests.rs` so already
queued mailbox mail must wake `wait_agent` promptly.

Relevant code:
- [`wait_agent` pending-mail
check](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_v2/wait.rs (L55-L60))
-
[`Session::has_pending_mailbox_items`](aa8ca06e83/codex-rs/core/src/session/mod.rs (L2979-L2981))
-
[`multi_agent_v2_wait_agent_returns_for_already_queued_mail`](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_tests.rs (L2854))

## Verification

- `cargo test -p codex-core
multi_agent_v2_wait_agent_returns_for_already_queued_mail`
2026-04-22 11:16:17 +01:00
acrognale-oai
4f8c58f737 Support multiple cwd filters for thread list (#18502)
## Summary

- Teach app-server `thread/list` to accept either a single `cwd` or an
array of cwd filters, returning threads whose recorded session cwd
matches any requested path
- Add `useStateDbOnly` as an explicit opt-in fast path for callers that
want to answer `thread/list` from SQLite without scanning JSONL rollout
files
- Preserve backwards compatibility: by default, `thread/list` still
scans JSONL rollouts and repairs SQLite state
- Wire the new cwd array and SQLite-only options through app-server,
local/remote thread-store, rollout listing, generated TypeScript/schema
fixtures, proto output, and docs

## Test Plan

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-rollout`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server thread_list`
- `just fmt`
- `just fix -p codex-app-server-protocol -p codex-rollout -p
codex-thread-store -p codex-app-server`
- `cargo build -p codex-cli --bin codex`
2026-04-22 06:10:09 -04:00
jif-oai
b04ffeee4c nit: expose lib (#18962)
As a follow-up
2026-04-22 10:06:53 +01:00
rhan-oai
213b17b7a3 [codex-analytics] guardian review TTFT plumbing and emission (#17696)
## Why

Guardian analytics includes time-to-first-token, but the Guardian
reviewer runs as a normal Codex session and `TurnCompleteEvent` did not
expose TTFT. The timing needs to flow through the standard
turn-completion protocol so Guardian review analytics can consume the
same value as the rest of the session machinery.

## What changed

Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and
populates it from `TurnTiming`. The value is carried through app-server
thread history, rollout reconstruction, TUI/app-server adapters, and
Guardian review session handling.

Guardian review analytics now captures TTFT from the reviewer
turn-complete event when available. Existing tests and fixtures are
updated to set the new optional field to `None` where TTFT is not
relevant.

## Verification

- `cargo clippy -p codex-tui --tests -- -D warnings`
- `cargo clippy -p codex-core --lib --tests -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696).
* __->__ #17696
* #17695
* #17693
* #18278
* #18953
2026-04-22 01:52:48 -07:00
rhan-oai
37aadeaa13 [codex-analytics] guardian review truncation (#17695)
## Why

The Guardian review event needs to report whether the action shown to
Guardian was truncated. That field should come from the same truncation
path used to build the Guardian prompt, rather than being inferred after
the fact.

## What changed

Plumbs truncation metadata through Guardian action formatting, prompt
construction, review session execution, and analytics emission.
`guardian_truncate_text` now reports both the rendered text and whether
it inserted the truncation marker, and `reviewed_action_truncated` is
set from that prompt-building result.

This keeps the analytics field aligned with the model-visible reviewed
action while preserving the existing Guardian prompt behavior.

## Verification

- Guardian truncation tests cover both truncated and non-truncated
action payloads.
- Guardian review tests assert the review session metadata and
truncation field are propagated.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17695).
* #17696
* __->__ #17695
* #17693
* #18278
* #18953
2026-04-22 08:35:29 +00:00
rhan-oai
4e7399c6b9 [codex-analytics] guardian review analytics events emission (#17693)
## Why

Guardian approvals now run as review sessions, but Codex analytics did
not have a terminal event for those reviews. That made it hard to
measure approval outcomes, failure modes, Guardian session reuse, model
metadata, token usage, and timing separately from the parent turn.

## What changed

Adds `codex_guardian_review` analytics emission for Guardian approval
reviews. The event is emitted from the Guardian review path with review
identity, target item id, approval request source, a PII-minimized
reviewed-action shape, terminal decision/status, failure reason,
Guardian assessment fields, Guardian session metadata, token usage, and
timing metadata.

The reviewed-action payload intentionally omits high-risk fields such as
shell commands, working directories, argv, file paths, network
targets/hosts, rationale, retry reason, and permission justifications.
It also classifies prompt-build failures separately from Guardian
session/runtime failures so fail-closed cases are distinguishable in
analytics.

## Verification

- Guardian review analytics tests cover terminal success,
timeout/cancel/fail-closed paths, session metadata, and token usage
plumbing.
- `cargo clippy -p codex-core --lib --tests -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17693).
* #17696
* #17695
* __->__ #17693
2026-04-22 01:02:47 -07:00
Michael Bolin
5eab9ff8ca app-server: expose thread permission profiles (#18278)
## Why

The `PermissionProfile` migration needs app-server clients to see the
same constrained permission model that core is using at runtime. Before
this PR, thread lifecycle responses only exposed the legacy
`SandboxPolicy` shape, so clients still had to infer active permissions
from sandbox fields. That makes downstream resume, fork, and override
flows harder to make `PermissionProfile`-first.

External sandbox policies are intentionally excluded from this canonical
view. External enforcement cannot be round-tripped as a
`PermissionProfile`, and exposing a lossy root-write profile would let
clients accidentally change sandbox semantics if they echo the profile
back later.

## What changed

- Adds the app-server v2 `PermissionProfile` wire shape, including
filesystem permissions and glob scan depth metadata.
- Adds `PermissionProfileNetworkPermissions` so the profile response
does not expose active network state through the older
additional-permissions naming.
- Returns `permissionProfile` from thread start, resume, and fork
responses when the active sandbox can be represented as a
`PermissionProfile`.
- Keeps legacy `sandbox` in those responses for compatibility and
documents `permissionProfile` as canonical when present.
- Makes lifecycle `permissionProfile` nullable and returns `null` for
`ExternalSandbox` to avoid exposing a lossy profile.
- Regenerates the app-server JSON schema and TypeScript fixtures.

## Verification

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server
thread_response_permission_profile_omits_external_sandbox --
--nocapture`
- `cargo check --tests -p codex-analytics -p codex-exec -p codex-tui`
- `just fix -p codex-app-server-protocol -p codex-app-server -p
codex-analytics -p codex-exec -p codex-tui`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18278).
* #18279
* __->__ #18278
2026-04-21 23:52:56 -07:00
iceweasel-oai
3a451b6321 use long-lived sessions for codex sandbox windows (#18953)
`codex sandbox windows` previously did a one-shot spawn for all
commands.
This change uses the `unified_exec` session to spawn long-lived
processes instead, and implements a simple bridge to forward stdin to
the spawned session and stdout/stderr from the spawned session back to
the caller.

It also fixes a bug with the new shared spawn context code where the
"no-network env" was being applied to both elevated and unelevated
sandbox spawns. It should only be applied for the unelevated sandbox
because the elevated one uses firewall rules instead of an env-based
network suppression strategy.
2026-04-22 06:39:29 +00:00
efrazer-oai
69c8913e24 feat: add explicit AgentIdentity auth mode (#18785)
## Summary

This PR adds `CodexAuth::AgentIdentity` as an explicit auth mode.

An AgentIdentity auth record is a standalone `auth.json` mode. When
`AuthManager::auth().await` loads that mode, it registers one
process-scoped task and stores it in runtime-only state on the auth
value. Header creation stays synchronous after that because the task is
initialized before callers receive the auth object.

This PR also removes the old feature flag path. AgentIdentity is
selected by explicit auth mode, not by a hidden flag or lazy mutation of
ChatGPT auth records.

Reference old stack: https://github.com/openai/codex/pull/17387/changes

## Design Decisions

- AgentIdentity is a real auth enum variant because it can be the only
credential in `auth.json`.
- The process task is ephemeral runtime state. It is not serialized and
is not stored in rollout/session data.
- Account/user metadata needed by existing Codex backend checks lives on
the AgentIdentity record for now.
- `is_chatgpt_auth()` remains token-specific.
- `uses_codex_backend()` is the broader predicate for ChatGPT-token auth
and AgentIdentity auth.

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. This PR: explicit AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
and load `CODEX_AGENT_IDENTITY`

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-21 22:33:24 -07:00
Michael Bolin
0fef35dc3a core: derive active permission profiles (#18277)
## Why

`Permissions` should not store a separate `PermissionProfile` that can
drift from the constrained `SandboxPolicy` and network settings. The
active profile needs to be derived from the same constrained values that
already honor `requirements.toml`.

## What changed

This adds derivation of the active `PermissionProfile` from the
constrained runtime permission settings and exposes that derived value
through config snapshots and thread state. The app-server can then
report the active profile without introducing a second source of truth.

## Verification

- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`



























---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18277).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* __->__ #18277
2026-04-21 22:11:40 -07:00
Celia Chen
51fdc35945 chore: remove unused Bedrock auth lazy loading (#18948)
## Summary

The Bedrock Mantle SigV4 auth provider currently looks like it can
lazily load `AwsAuthContext`, but the provider is only constructed after
`resolve_auth_method` has already loaded that context. Because
`with_context` always pre-populates the `OnceCell`, the
`get_or_try_init` fallback is unused in normal operation and makes the
provider lifecycle harder to reason about.

This change removes that dead lazy-loading path and makes the actual
behavior explicit:

- `BedrockAuthMethod::AwsSdkAuth` carries only the resolved
`AwsAuthContext`.
- `BedrockMantleSigV4AuthProvider` stores the resolved context directly.
- request signing uses the stored context without going through
`OnceCell`.

The existing eager AWS auth resolution behavior is unchanged; this is a
simplification of the provider state, not a behavior change.

## Testing

- `cargo shear`
- `cargo test -p codex-model-provider`
- `just bazel-lock-check`
2026-04-22 05:01:22 +00:00
Dylan Hurd
34800d717e [codex] Clean guardian instructions (#18934)
## Summary
- Keep the guardian policy installed as guardian base instructions.
- Clear inherited parent `developer_instructions` for guardian review
sessions.
- Update guardian config tests to assert developer instructions are
cleared and policy text is sourced from base instructions.

## Why
Guardian review sessions are intended to run under an isolated guardian
policy. Because the guardian config is cloned from the parent config,
inherited custom or managed developer instructions could otherwise
remain active and conflict with guardian review behavior.

## Validation
- `just fmt`
- `cargo test -p codex-core guardian_review_session_config`

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 21:47:58 -07:00
Michael Bolin
faed6d5c07 tests: serialize process-heavy Windows CI suites (#18943)
## Why

A [Windows Cargo
build](https://github.com/openai/codex/actions/runs/24754807756/job/72425641062)
on `main` timed out in several unrelated-looking suites at the same
time:

- `codex-app-server` account tests failed before account logic, while
`mcp.initialize()` was waiting for the first JSON-RPC response.
- `codex-core` `apply_patch_cli` tests timed out while running full
Codex/apply_patch turns.
- `codex-windows-sandbox` legacy session tests timed out while creating
restricted-token child processes and private desktops.

The app-server log reached the test harness write path in
[`McpProcess::initialize_with_params`](731b54d08f/codex-rs/app-server/tests/common/mcp_process.rs (L244-L263)),
but never printed the matching stdout read from
[`read_jsonrpc_message`](731b54d08f/codex-rs/app-server/tests/common/mcp_process.rs (L1123-L1128)).
The server initialize handler is a small bookkeeping/response path
([`message_processor.rs`](731b54d08f/codex-rs/app-server/src/message_processor.rs (L601-L728))),
so the failure looks like Windows runner process/pipe scheduling
starvation rather than account-specific behavior.

## What Changed

This updates `.config/nextest.toml` to serialize two process-heavy sets:

- `codex-core` tests matching `package(codex-core) & kind(test) &
test(apply_patch_cli)`
- `codex-windows-sandbox` tests matching `package(codex-windows-sandbox)
& test(legacy_)`

`codex-app-server` integration tests were already serialized inside
their own package; this change reduces overlap with the other suites
that were saturating the runner at the same time.

## Verification

- `cargo nextest list --filterset "package(codex-core) & kind(test) &
test(apply_patch_cli)"`
- `cargo nextest list --filterset "package(codex-windows-sandbox) &
test(legacy_)"`

The Windows sandbox filter naturally lists no tests on macOS, but it
validates the nextest filter/config syntax locally.
2026-04-21 21:14:45 -07:00
Dylan Hurd
0e39614d87 chore(tui) debug-config guardian_policy_config (#18923)
## Summary
List guardian_policy_config_source in `/debug-config` output

## Testing
 - [x] Ran locally
2026-04-21 21:00:23 -07:00
Eric Traut
c7e5a9d95e Keep TUI status surfaces in sync (#18935) 2026-04-21 20:39:23 -07:00
Michael Bolin
03ae4db0f4 ci: keep argument comment lint checks materialized (#18926)
## Why

The fast `rust-ci` workflow decides whether to run the cross-platform
`argument-comment-lint` job based on changed paths. PRs that touch
Rust-adjacent Bazel wrapper files, such as `defs.bzl` or
`workspace_root_test_launcher.*.tpl`, can change how Rust tests and lint
targets behave without changing any `.rs` files.

When that detector returned false, GitHub skipped the matrix job before
expanding it. That produced a single skipped check named `Argument
comment lint - ${{ matrix.name }}` instead of the Linux, macOS, and
Windows check names that branch protection expects, leaving the PR
unable to go green when those matrix checks are required.

## What Changed

- Treat root Bazel wrapper files as `argument-comment-lint` relevant
changes.
- Keep the `argument_comment_lint_prebuilt` matrix job materialized for
every PR so the per-platform check names always exist.
- Add a single gate step that decides whether the real lint work should
run.
- Move the checkout-adjacent Bazel setup and OS-specific lint commands
into `.github/actions/run-argument-comment-lint/action.yml` so the
workflow does not repeat the same path-detection condition on each step.

## Verification

- Parsed `.github/workflows/rust-ci.yml` and
`.github/actions/run-argument-comment-lint/action.yml` with Python YAML
loading.
- Simulated the workflow path-matching shell conditions for the root
Bazel wrapper files and confirmed they set `argument_comment_lint=true`.
2026-04-22 03:36:46 +00:00
Michael Bolin
36f8bb4ffa exec-server: carry filesystem sandbox profiles (#18276)
## Why

The exec-server still needs platform sandbox inputs, but the migration
should preserve the `PermissionProfile` that produced them. Keeping only
the derived legacy sandbox map would keep `SandboxPolicy` as the
effective abstraction and would make full-disk vs. restricted profiles
harder to preserve as the permissions stack starts round-tripping
profiles.

`PermissionProfile` entries can also be cwd-sensitive (`:cwd`,
`:project_roots`, relative globs), so the exec-server must carry the
request sandbox cwd instead of resolving those entries against the
long-lived exec-server process cwd.

## What changed

`FileSystemSandboxContext` now carries `permissions: PermissionProfile`
plus an optional `cwd`:

- removed `sandboxPolicy`, `sandboxPolicyCwd`,
`fileSystemSandboxPolicy`, and `additionalPermissions`
- added `permissions` and `cwd`
- kept the platform knobs `windowsSandboxLevel`,
`windowsSandboxPrivateDesktop`, and `useLegacyLandlock`

Core turn and apply-patch paths populate the context from the active
runtime permissions and request cwd. Exec-server derives platform
`SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary,
adds helper runtime reads there, and rejects cwd-dependent profiles that
arrive without a cwd.

The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor
now preserves the old workspace-write conversion semantics for
compatibility tests/callers.

## Verification

- `cargo test -p codex-exec-server`
- `cargo test -p codex-exec-server sandbox_cwd -- --nocapture`
- `cargo test -p codex-exec-server
sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths
-- --nocapture`
- `cargo test -p codex-core --lib
file_system_sandbox_context_uses_active_attempt -- --nocapture`
2026-04-21 20:22:28 -07:00
efrazer-oai
564860e8bd refactor: add agent identity crate (#18871)
## Summary

This PR adds `codex-agent-identity` as an isolated crate for Agent
Identity business logic.

The crate owns:
- AgentAssertion construction.
- Agent task registration.
- private-key assertion signing.
- bounded blocking HTTP for task registration.

It does not wire AgentIdentity into `auth.json`, `AuthManager`, rollout
state, or request callsites. That integration happens in later PRs.

Reference old stack: https://github.com/openai/codex/pull/17387/changes

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. This PR: isolated Agent Identity crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
and load `CODEX_AGENT_IDENTITY`

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-21 19:57:49 -07:00
Michael Bolin
8fea372c77 Fix remote app-server shutdown race (#18936)
## Why

A Mac Bazel CI run saw `remote_notifications_arrive_over_websocket` fail
during shutdown with `remote app-server shutdown channel is closed`
(https://app.buildbuddy.io/invocation/9dac05d6-ae20-40f9-b627-fca6e91cf127).
The remote websocket worker can legitimately finish while `shutdown()`
is waiting for the shutdown acknowledgement: after the test server sends
a notification and exits, the worker may deliver the required disconnect
event, observe that the caller has dropped the event receiver, and exit
before it sends the shutdown one-shot.

That state is already terminal cleanup, not a failed shutdown, so
callers should not see a `BrokenPipe` from the acknowledgement channel.

## What Changed

- Treat a closed remote shutdown acknowledgement as an already-exited
worker while still propagating websocket close errors when the worker
returns them.
- Added a deterministic regression test for the interleaving where the
shutdown command is received and the worker exits before replying.

## Verification

- `cargo test -p codex-app-server-client`
- New test:
`remote::tests::shutdown_tolerates_worker_exit_after_command_is_queued`
2026-04-22 02:41:19 +00:00
xl-openai
a978e411f6 feat: Support remote plugin list/read. (#18452)
Add a temporary internal remote_plugin feature flag that merges remote
marketplaces into plugin/list and routes plugin/read through the remote
APIs when needed, while keeping pure local marketplaces working as
before.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 18:39:07 -07:00
Michael Bolin
536952eeee bazel: run wrapped Rust unit test shards (#18913)
## Why

The `codex-tui` Cargo test suite was catching stale snapshot
expectations, but the matching Bazel unit-test target was still green.
The TUI unit target is wrapped by `workspace_root_test` so tests run
from the repository root and Insta can resolve Cargo-like snapshot
paths. After native Bazel sharding was enabled for that wrapped target,
rules_rust also inserted its own sharding wrapper around the Rust test
binary.

Those two wrappers did not compose: rules_rust's sharding wrapper
expects to run from its own runfiles cwd, while `workspace_root_test`
deliberately changes cwd to the repo root before invoking the test. In
that configuration, the inner wrapper could fail to enumerate the Rust
tests and exit successfully with empty shards, so snapshot regressions
were not being exercised by Bazel.

## What Changed

- Stop enabling rules_rust's inner `experimental_enable_sharding` for
unit-test binaries created by `codex_rust_crate`.
- Keep the configured `shard_count` on the outer `workspace_root_test`
target.
- Add libtest sharding directly to `workspace_root_test_launcher.sh.tpl`
and `workspace_root_test_launcher.bat.tpl` after the launcher has
resolved the actual test binary and established the intended
repository-root cwd.
- Partition tests by a stable FNV-1a hash of each libtest test name,
matching the stable-shard behavior we wanted without depending on the
inner rules_rust wrapper.
- Preserve ad-hoc local test filters by running the resolved test binary
directly when explicit test args are supplied.
- On Windows, run selected libtest names from the shard list in bounded
PowerShell batches instead of concatenating every selected test into one
`cmd.exe` command line.

This PR is stacked on top of #18912, which contains only the snapshot
expectation updates exposed once the Bazel target actually runs the TUI
unit tests. It is also the reason #18916 becomes visible: once this
wrapper fix makes Bazel execute the affected `codex-core` test, that
test needs its own executable-path setup fixed.

## Verification

- `cargo test -p codex-tui`
- `bazel test //codex-rs/tui:tui-unit-tests --test_output=errors`
- `bazel test //codex-rs/tui:all --test_output=errors`
- `bash -n workspace_root_test_launcher.sh.tpl`
- Exercised the Windows PowerShell batching fragment locally with a fake
test binary and shard-list file.
2026-04-21 18:35:47 -07:00
Celia Chen
1cd3ad1f49 feat: add AWS SigV4 auth for OpenAI-compatible model providers (#17820)
## Summary

Add first-class Amazon Bedrock Mantle provider support so Codex can keep
using its existing Responses API transport with OpenAI-compatible
AWS-hosted endpoints such as AOA/Mantle.

This is needed for the AWS launch path, where provider traffic should
authenticate with AWS credentials instead of OpenAI bearer credentials.
Requests are authenticated immediately before transport send, so SigV4
signs the final method, URL, headers, and body bytes that `reqwest` will
send.

## What Changed

- Added a new `codex-aws-auth` crate for loading AWS SDK config,
resolving credentials, and signing finalized HTTP requests with AWS
SigV4.
- Added a built-in `amazon-bedrock` provider that targets Bedrock Mantle
Responses endpoints, defaults to `us-east-1`, supports region/profile
overrides, disables WebSockets, and does not require OpenAI auth.
- Added Amazon Bedrock auth resolution in `codex-model-provider`: prefer
`AWS_BEARER_TOKEN_BEDROCK` when set, otherwise use AWS SDK credentials
and SigV4 signing.
- Added `AuthProvider::apply_auth` and `Request::prepare_body_for_send`
so request-signing providers can sign the exact outbound request after
JSON serialization/compression.
- Determine the region by taking the `aws.region` config first (required
for bearer token codepath), and fallback to SDK default region.

## Testing
Amazon Bedrock Mantle Responses paths:

- Built the local Codex binary with `cargo build`.
- Verified the custom proxy-backed `aws` provider using `env_key =
"AWS_BEARER_TOKEN_BEDROCK"` streamed raw `responses` output with
`response.output_text.delta`, `response.completed`, and `mantle-env-ok`.
- Verified a full `codex exec --profile aws` turn returned
`mantle-env-ok`.
- Confirmed the custom provider used the bearer env var, not AWS profile
auth: bogus `AWS_PROFILE` still passed, empty env var failed locally,
and malformed env var reached Mantle and failed with `401
invalid_api_key`.
- Verified built-in `amazon-bedrock` with `AWS_BEARER_TOKEN_BEDROCK` set
passed despite bogus AWS profiles, returning `amazon-bedrock-env-ok`.
- Verified built-in `amazon-bedrock` SDK/SigV4 auth passed with
`AWS_BEARER_TOKEN_BEDROCK` unset and temporary AWS session env
credentials, returning `amazon-bedrock-sdk-env-ok`.
2026-04-22 01:11:17 +00:00
Michael Bolin
e18fe7a07f test(core): move prompt debug coverage to integration suite (#18916)
## Why

`build_prompt_input` now initializes `ExecServerRuntimePaths`, which
requires a configured Codex executable path. The previous inline unit
test in `core/src/prompt_debug.rs` built a bare `test_config()` and then
failed before it could assert anything useful:

```text
Codex executable path is not configured
```

This coverage is also integration-shaped: it drives the public
`build_prompt_input` entry point through config, thread, and session
setup rather than testing a small internal helper in isolation.

Bazel CI did not catch this earlier because the affected test was behind
the same wrapped Rust unit-test path fixed by #18913. Before that
launcher/sharding fix, the outer `workspace_root_test` changed the
working directory for Insta compatibility while the inner `rules_rust`
sharding wrapper still expected its runfiles working directory. In
practice, Bazel could report success without executing the Rust test
cases in that shard. Once #18913 makes the wrapper run the Rust test
binary directly and shard with libtest arguments, this stale unit test
actually runs and exposes the missing `codex_self_exe` setup.

## What Changed

- Moved `build_prompt_input_includes_context_and_user_message` out of
`core/src/prompt_debug.rs`.
- Added `core/tests/suite/prompt_debug_tests.rs` and registered it from
`core/tests/suite/mod.rs`.
- Builds the test config with `ConfigBuilder` and provides
`codex_self_exe` using the current test executable, matching the
runtime-path invariant required by prompt debug setup.
- Preserves the existing assertions that the generated prompt input
includes both the debug user message and project-specific user
instructions.

## Verification

- `cargo test -p codex-core --test all
prompt_debug_tests::build_prompt_input_includes_context_and_user_message`
- `bazel test //codex-rs/core:core-all-test
--test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message
--test_output=errors`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916).
* #18913
* __->__ #18916
2026-04-22 01:08:25 +00:00
Felipe Coury
09ebc34f17 fix(core): emit hooks for apply_patch edits (#18391)
Fixes https://github.com/openai/codex/issues/16732.

## Why

`apply_patch` is Codex's primary file edit path, but it was not emitting
`PreToolUse` or `PostToolUse` hook events. That meant hook-based policy,
auditing, and write coordination could observe shell commands while
missing the actual file mutation performed by `apply_patch`.

The issue also exposed that the hook runtime serialized command hook
payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch`
supplied hook payloads, hooks would either fail to match it directly or
receive misleading stdin that identified the edit as a Bash tool call.

## What Changed

- Added `PreToolUse` and `PostToolUse` payload support to
`ApplyPatchHandler`.
- Exposed the raw patch body as `tool_input.command` for both
JSON/function and freeform `apply_patch` calls.
- Taught tool hook payloads to carry a handler-supplied hook-facing
`tool_name`.
- Preserved existing shell compatibility by continuing to emit `Bash`
for shell-like tools.
- Serialized the selected hook `tool_name` into hook stdin instead of
hardcoding `Bash`.
- Relaxed the generated hook command input schema so `tool_name` can
represent tools other than `Bash`.

## Verification

Added focused handler coverage for:

- JSON/function `apply_patch` calls producing a `PreToolUse` payload.
- Freeform `apply_patch` calls producing a `PreToolUse` payload.
- Successful `apply_patch` output producing a `PostToolUse` payload.
- Shell and `exec_command` handlers continuing to expose `Bash`.

Added end-to-end hook coverage for:

- A `PreToolUse` hook matching `^apply_patch$` blocking the patch before
the target file is created.
- A `PostToolUse` hook matching `^apply_patch$` receiving the patch
input and tool response, then adding context to the follow-up model
request.
- Non-participating tools such as the plan tool continuing not to emit
`PreToolUse`/`PostToolUse` hook events.

Also validated manually with a live `codex exec` smoke test using an
isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed
that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with
`tool_name: "apply_patch"`, a shell command still emits `tool_name:
"Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file
from being created.
2026-04-21 22:00:40 -03:00
starr-openai
1d4cc494c9 Add turn-scoped environment selections (#18416)
## Summary
- add experimental turn/start.environments params for per-turn
environment id + cwd selections
- pass selections through core protocol ops and resolve them with
EnvironmentManager before TurnContext creation
- treat omitted selections as default behavior, empty selections as no
environment, and non-empty selections as first environment/cwd as the
turn primary

## Testing
- ran `just fmt`
- ran `just write-app-server-schema`
- not run: unit tests for this stacked PR

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 17:48:33 -07:00
Michael Bolin
6368f506b7 fix: windows snapshot for external_agent_config_migration::tests::prompt_snapshot did not match windows output (#18915)
Fix a snapshot test that is failing on Windows, but is currently missed
by Bazel due to https://github.com/openai/codex/pull/18913. We see this
failing on Cargo builds on Windows, though.

This Bazel vs. Cargo inconsistency explains why
https://github.com/openai/codex/pull/18768 did not fix the Cargo Windows
build.
2026-04-22 00:32:46 +00:00
Michael Bolin
799e50412e sandboxing: materialize cwd-relative permission globs (#18867)
## Why

#18275 anchors session-scoped `:cwd` and `:project_roots` grants to the
request cwd before recording them for reuse. Relative deny glob entries
need the same treatment. Without anchoring, a stored session permission
can keep a pattern such as `**/*.env` relative, then reinterpret that
deny against a later turn cwd. That makes the persisted profile depend
on the cwd at reuse time instead of the cwd that was reviewed and
approved.

## What changed

`intersect_permission_profiles` now materializes retained
`FileSystemPath::GlobPattern` entries against the request cwd, matching
the existing materialization for cwd-sensitive special paths.

Materialized accepted grants are now deduplicated before deny retention
runs. This keeps the sticky-grant preapproval shape stable when a
repeated request is merged with the stored grant and both `:cwd = write`
and the materialized absolute cwd write are present.

The preapproval check compares against the same materialized form, so a
later request for the same cwd-relative deny glob still matches the
stored anchored grant instead of re-prompting or rejecting.

Tests cover both the storage path and the preapproval path: a
session-scoped `:cwd = write` grant with `**/*.env = none` is stored
with both the cwd write and deny glob anchored to the original request
cwd, cannot be reused from a later cwd, and remains preapproved when
re-requested from the original cwd after merging with the stored grant.

## Verification

- `cargo test -p codex-sandboxing policy_transforms`
- `cargo test -p codex-core --lib
relative_deny_glob_grants_remain_preapproved_after_materialization`
- `cargo clippy -p codex-sandboxing --tests -- -D
clippy::redundant_clone`
- `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* __->__ #18867
2026-04-21 17:28:58 -07:00
canvrno-oai
37701d4654 Update /statusline and /title snapshots (#18909)
Update `/statusline` and `/title` snapshots
2026-04-21 17:16:50 -07:00
alexsong-oai
6bbd710496 [codex] Tighten external migration prompt tests (#18768)
## Summary
- tighten the external migration prompt snapshot around stable synthetic
fixture text
- add focused display_description tests for relative path rewriting and
plugin summaries
- split the path-format assertions into smaller, easier-to-read unit
tests

## Why
The previous prompt snapshot was coupled to path text that came from
detected migration items, which made it noisier and more brittle than
necessary. This change keeps the snapshot focused on stable UI structure
and moves dynamic path formatting checks into targeted unit tests.

## Validation
- cargo test -p codex-tui external_agent_config_migration::tests::
- cargo test -p codex-tui
external_agent_config_migration::tests::display_description_
- just fmt

## Notes
Per the repo instructions, I did not rerun tests after the final `just
fmt` pass.
2026-04-21 16:20:15 -07:00
canvrno-oai
2202675632 Normalize /statusline & /title items (#18886)
This change aligns the `/statusline` and `/title` UIs around the same
normalized item model so both surfaces use consistent ids, labels, and
preview semantics. It keeps the shared preview work from #18435 ,
tightens the remaining mismatches by standardizing item naming, expands
title/status item coverage where appropriate, and makes `/title` preview
use the same title-specific formatting path as the real rendered
terminal title.

- Normalizes persisted item ids and keeps legacy aliases for
compatibility
- Aligns `status-line` and `terminal-title` items with the shared
preview model
- Routes `terminal-title` preview through title-specific formatting and
truncation
- Updates the affected status/title setup snapshots

Added to `/statusline`:
- status
- task-progress
  
Normalized in `/statusline`:
- model-name -> model
- project-root -> project-name

Added to `/title`:
- current-dir
- context-remaining
- context-used
- five-hour-limit
- weekly-limit
- codex-version
- used-tokens
- total-input-tokens
- total-output-tokens
- session-id
- fast-mode
- model-with-reasoning

Normalized in `/title`:
- project -> project-name
- thread -> thread-title
- model-name -> model
2026-04-21 16:13:09 -07:00
maja-openai
ef00014a46 Allow guardian bare allow output (#18797)
## Summary

Allow guardian to skip other fields and output only
`{"outcome":"allow"}` when the command is low risk.
This change lets guardian reviews use a non-strict text format while
keeping the JSON schema itself as plain user-visible schema data, so
transport strictness is carried out-of-band instead of through a schema
marker key.

## What changed

- Add an explicit `output_schema_strict` flag to model prompts and pass
it into `codex-api` text formatting.
- Set guardian reviewer prompts to non-strict schema validation while
preserving strict-by-default behavior for normal callers.
- Update the guardian output contract so definitely-low-risk decisions
may return only `{"outcome":"allow"}`.
- Treat bare allow responses as low-risk approvals in the guardian
parser.
- Add tests and snapshots covering the non-strict guardian request and
optional guardian output fields.

## Verification

- `cargo test -p codex-core guardian::tests::guardian`
- `cargo test -p codex-core guardian::tests::`
- `cargo test -p codex-core client_common::tests::`
- `cargo test -p codex-protocol
user_input_serialization_includes_final_output_json_schema`
- `cargo test -p codex-api`
- `git diff --check`

Note: `cargo test -p codex-core` was also attempted, but this desktop
environment injects ambient config/proxy state that causes unrelated
config/session tests expecting pristine defaults to fail.

---------

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-21 15:37:12 -07:00
starr-openai
ddbe2536be Support multiple managed environments (#18401)
## Summary
- refactor EnvironmentManager to own keyed environments with
default/local lookup helpers
- keep remote exec-server client creation lazy until exec/fs use
- preserve disabled agent environment access separately from internal
local environment access

## Validation
- not run (per Codex worktree instruction to avoid tests/builds unless
requested)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 15:29:35 -07:00
cassirer-openai
27d9673273 [rollout_trace] Add rollout trace crate (#18876)
## Summary

Adds the standalone `codex-rollout-trace` crate, which defines the raw
trace event format, replay/reduction model, writer, and reducer logic
for reconstructing model-visible conversation/runtime state from
recorded rollout data.

The crate-level design is documented in
[`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).

## Stack

This is PR 1/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR intentionally does not wire tracing into live Codex execution.
It establishes the data model and reducer contract first, with
crate-local tests covering conversation reconstruction, compaction
boundaries, tool/session edges, and code-cell lifecycle reduction. Later
PRs emit into this model.

The README is the best entry point for reviewing the intended trace
format and reduction semantics before diving into the reducer modules.
2026-04-21 21:54:05 +00:00
Shijie Rao
c5e9c6f71f Preserve Cloudfare HTTP cookies in codex (#17783)
## Summary
- Adds a process-local, in-memory cookie store for ChatGPT HTTP clients.
- Limits cookie storage and replay to a shared ChatGPT host allowlist.
- Wires the shared store into the default Codex reqwest client and
backend client.
- Shares the ChatGPT host allowlist with remote-control URL validation
to avoid drift.
- Enables reqwest cookie support and updates lockfiles.
2026-04-21 14:40:15 -07:00
efrazer-oai
be75785504 fix: fully revert agent identity runtime wiring (#18757)
## Summary

This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes

It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.

This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.

## Stack

1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-21 14:30:55 -07:00
Ruslan Nigmatullin
69c3d12274 app-server: implement device key v2 methods (#18430)
## Why

The device-key protocol needs an app-server implementation that keeps
local key operations behind the same request-processing boundary as
other v2 APIs.

app-server owns request dispatch, transport policy, documentation, and
JSON-RPC error shaping. `codex-device-key` owns key binding, validation,
platform provider selection, and signing mechanics. Keeping the adapter
thin makes the boundary easier to review and avoids moving local
key-management details into thread orchestration code.

## What changed

- Added `DeviceKeyApi` as the app-server adapter around
`DeviceKeyStore`.
- Converted protocol protection policies, payload variants, algorithms,
and protection classes to and from the device-key crate types.
- Encoded SPKI public keys and DER signatures as base64 protocol fields.
- Routed `device/key/create`, `device/key/public`, and `device/key/sign`
through `MessageProcessor`.
- Rejected remote transports before provider access while allowing local
`stdio` and in-process callers to reach the device-key API.
- Added stdio, in-process, and websocket tests for device-key validation
and transport policy.
- Documented the device-key methods in the app-server v2 method list.

## Test coverage

- `device_key_create_rejects_empty_account_user_id`
- `in_process_allows_device_key_requests_to_reach_device_key_api`
- `device_key_methods_are_rejected_over_websocket`

## Stack

This is PR 3 of 4 in the device-key app-server stack. It is stacked on
#18429.

## Validation

- `cargo test -p codex-app-server device_key`
- `just fix -p codex-app-server`
2026-04-21 14:07:08 -07:00
Felipe Coury
e502f0b52d feat(tui): shortcuts to change reasoning level temporarily (#18866)
## Summary

Adds main-chat shortcuts for changing reasoning effort one step at a
time:

- `Alt+,` lowers reasoning (has the `<` arrow on the key)
- `Alt+.` raises reasoning (similarly, has the `>` arrow)

The shortcut updates the active session only. It does not persist the
selected reasoning level as the default for future sessions. In Plan
mode, it applies temporarily to Plan mode without opening the
global-vs-Plan scope prompt.

## Details

The shortcut uses the active model preset to decide which reasoning
levels are valid. If the current session has no explicit reasoning
effort, it starts from the model default. Each keypress moves to the
next supported level in the requested direction.

The shortcut only runs from the main chat surface. If a popup or modal
is open, input remains owned by that UI.

In Plan mode, the shortcut updates the in-memory Plan reasoning override
directly. The model/reasoning picker still keeps the existing scope
prompt for explicit picker changes.

## Notes

Ctrl-plus and Ctrl-minus were considered, but terminals do not deliver
those combinations consistently, so this PR uses Alt shortcuts instead.

If the current effort is unsupported by the selected model, the shortcut
skips to the nearest supported level in the requested direction. If
there is no valid step, it shows the existing boundary message.

## Tests

- `cargo test -p codex-tui reasoning_shortcuts`
- `cargo test -p codex-tui reasoning_effort`
- `cargo test -p codex-tui reasoning_shortcut`
- `cargo test -p codex-tui footer_snapshots`
- `cargo test -p codex-tui`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests`

---------

Co-authored-by: Eric Traut <etraut@openai.com>
2026-04-21 18:04:03 -03:00
pakrym-oai
ffa6944587 Load app-server config through ConfigManager (#18870)
## Summary
- Load app-server startup config through `ConfigManager` instead of
direct `ConfigBuilder` calls.
- Move `ConfigManager` constructor-owned state (`cli_overrides`, runtime
feature map, cloud requirements loader) behind internal manager fields.
- Pass `ConfigManager` into `MessageProcessor` directly instead of
reconstructing it from raw args.

## Tests
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server`
- `just fix -p codex-app-server`
- `just fmt`
2026-04-21 14:01:02 -07:00
jif-oai
15b8cde2a4 chore: default multi-agent v2 fork to all (#18873)
Default sub-agents v2 to `all` for the fork mode
2026-04-21 21:54:58 +01:00
iceweasel-oai
6f6997758a skip busted tests while I fix them (#18885) 2026-04-21 13:40:34 -07:00
Ruslan Nigmatullin
56375712e3 app-server: fix Bazel clippy in tracing tests (#18872)
## Why

PR #18431 exposed a Bazel clippy failure in the app-server unit-test
target across Linux, macOS, and Windows. The failing lint was
`clippy::await_holding_invalid_type`: two tracing tests serialized
access to global tracing state by holding a `tokio::sync::MutexGuard`
across awaited test work.

That serialization is still needed because the tests share
process-global tracing setup and exporter state, but it should not
require holding an async mutex guard through the whole test body.

## What changed

- Replaced the bespoke async `tracing_test_guard` helper with
`serial_test` on the two tracing tests that need global tracing
serialization.
- Removed the `#[expect(clippy::await_holding_invalid_type)]`
annotations and the lock guard callsites that Bazel clippy rejected.

## Validation

- `cargo test -p codex-app-server jsonrpc_span`
- `just fix -p codex-app-server`
- `git diff --check`

I also attempted the exact failing Bazel clippy target locally with
BuildBuddy disabled: `bazel --noexperimental_remote_repo_contents_cache
build --config=clippy --bes_backend= --remote_cache=
--experimental_remote_downloader= --
//codex-rs/app-server:app-server-unit-tests-bin`. That run did not reach
clippy because Bazel timed out downloading `libcap-2.27.tar.gz` from
`kernel.org`.
2026-04-21 13:10:36 -07:00
Ruslan Nigmatullin
5bab04dcd7 app-server: add codex-device-key crate (#18429)
## Why

Device-key storage and signing are local security-sensitive operations
with platform-specific behavior. Keeping the core API in
`codex-device-key` keeps app-server focused on routing and business
logic instead of owning key-management details.

The crate keeps the signing surface intentionally narrow: callers can
create a bound key, fetch its public key, or sign one of the structured
payloads accepted by the crate. It does not expose a generic
arbitrary-byte signing API.

Key IDs cross into platform-specific labels, tags, and metadata paths,
so externally supplied IDs are constrained to the same auditable
namespace created by the crate: `dk_` followed by unpadded base64url for
32 bytes. Remote-control target paths are also tied to each signed
payload shape so connection proofs cannot be reused for enrollment
endpoints, or vice versa.

## What changed

- Added the `codex-device-key` workspace crate.
- Added account/client-bound key creation with stable `dk_` key IDs.
- Added strict `key_id` validation before public-key lookup or signing
reaches a provider.
- Added public-key lookup and structured signing APIs.
- Split remote-control client endpoint allowlists by connection vs
enrollment payload shape.
- Added validation for key bindings, accepted payload fields, token
expiration, and payload/key binding mismatches.
- Added flow-oriented docs on the validation helpers that gate provider
signing.
- Added protection policy and protection-class types without wiring a
platform provider yet.
- Added an unsupported default provider so platforms without an
implementation fail explicitly instead of silently falling back to
software-backed keys.
- Updated Cargo and Bazel lock metadata for the new crate and
non-platform-specific dependencies.

## Stack

This is stacked on #18428.

## Validation

- `cargo test -p codex-device-key`
- Added unit coverage for strict `key_id` validation before provider
use.
- Added unit coverage that rejects remote-control paths from the wrong
signed payload shape.
- `just bazel-lock-update`
- `just bazel-lock-check`
2026-04-21 17:57:00 +00:00
iceweasel-oai
8612714aa6 Add Windows sandbox unified exec runtime support (#15578)
## Summary

This is the runtime/foundation half of the Windows sandbox unified-exec
work.

- add Windows sandbox `unified_exec` session support in
`windows-sandbox-rs` for both:
  - the legacy restricted-token backend
  - the elevated runner backend
- extend the PTY/process runtime so driver-backed sessions can support:
  - stdin streaming
  - stdout/stderr separation
  - exit propagation
  - PTY resize hooks
- add Windows sandbox runtime coverage in `codex-windows-sandbox` /
`codex-utils-pty`

This PR does **not** enable Windows sandbox `UnifiedExec` for product
callers yet because hooking this up to app-server comes in the next PR.

Windows sandbox advertising is intentionally kept aligned with `main`,
so sandboxed Windows callers still fall back to `ShellCommand`.

This PR isolates the runtime/session layer so it can be reviewed
independently from product-surface enablement.

---------

Co-authored-by: jif-oai <jif@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-21 10:44:49 -07:00
Steve Coffey
38ba876ea9 Refresh generated Python app-server SDK types (#18862)
This is the first step in splitting the Python SDK PyPI publish work
into reviewable layers: land the generated SDK refresh by itself before
changing packaging mechanics. The next PRs will make the runtime wheel
publishable, then wire the SDK package/version pinning to that runtime.

## Summary
- Refresh generated Python app-server v2 models and notification
registry from the current schema.
- Update the public API signature expectations for the newly generated
kwargs.

## Stack
- PR 1 of 3 for the Python SDK PyPI publishing split.
- Follow-up PRs will handle runtime wheel publishing mechanics, then
SDK/package version pinning.

## Tests
- `uv run --extra dev pytest` in `sdk/python` -> 51 passed, 37 skipped.
2026-04-21 10:23:27 -07:00
Michael Bolin
f8562bd47b sandboxing: intersect permission profiles semantically (#18275)
## Why

Permission approval responses must not be able to grant more access than
the tool requested. Moving this flow to `PermissionProfile` means the
comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
cwd-relative special paths such as `:cwd` and `:project_roots` must stay
anchored to the turn that produced the request.

## What changed

This implements semantic `PermissionProfile` intersection in
`codex-sandboxing` for file-system and network permissions. The
intersection accepts narrower path grants, rejects broader grants,
preserves deny-read carve-outs and glob scan depth, and materializes
cwd-dependent special-path grants to absolute paths before they can be
recorded for reuse.

The request-permissions response paths now use that intersection
consistently. App-server captures the request turn cwd before waiting
for the client response, includes that cwd in the v2 approval params,
and core stores the requested profile plus cwd for direct TUI/client
responses and Guardian decisions before recording turn- or
session-scoped grants. The TUI app-server bridge now preserves the
app-server request cwd when converting permission approval params into
core events.

## Verification

- `cargo test -p codex-sandboxing intersect_permission_profiles --
--nocapture`
- `cargo test -p codex-app-server request_permissions_response --
--nocapture`
- `cargo test -p codex-core
request_permissions_response_materializes_session_cwd_grants_before_recording
-- --nocapture`
- `cargo check -p codex-tui --tests`
- `cargo check --tests`
- `cargo test -p codex-tui
app_server_request_permissions_preserves_file_system_permissions`
2026-04-21 10:23:01 -07:00
pakrym-oai
2a226096f6 Split DeveloperInstructions into individual fragments. (#18813)
Split DeveloperInstructions into individual fragments.
2026-04-21 10:22:36 -07:00
pakrym-oai
5fe767e8e1 Refactor app-server config loading into ConfigManager (#18442)
Localize app-server configuration loading in one place.
2026-04-21 10:22:26 -07:00
Eric Traut
4ed722ab8d Move TUI app tests to modules they cover (#18799)
## Summary

The TUI app refactor in #18753 moved the old `app.rs` tests into a
single `app/tests.rs` file. That kept the split mechanically simple, but
it left several focused unit tests far from the modules they exercise.

This PR is a follow-up that moves tests next to the code they cover.

It also adds `tui/src/app/test_support.rs` for shared fixture
construction.

This is just a mechanical refactoring (no functional changes) and does
not affect any production code.
2026-04-21 10:16:51 -07:00
jif-oai
10e1659d4f Stabilize debug clear memories integration test (#18858)
## Why

`debug_clear_memories_resets_state_and_removes_memory_dir` can be flaky
because the test drops its `sqlx::SqlitePool` immediately before
invoking `codex debug clear-memories`. Dropping the pool does not wait
for all SQLite connections to close, so the CLI can race with still-open
test connections.

## What changed

- Await `pool.close()` before spawning `codex debug clear-memories`.
- Close the reopened verification pool before the temp `CODEX_HOME` is
torn down.

## Verification

- `cargo test -p codex-cli --test debug_clear_memories
debug_clear_memories_resets_state_and_removes_memory_dir`
2026-04-21 18:15:37 +01:00
Eric Traut
b7fec54354 Queue follow-up input during user shell commands (#18820)
Fixes #17954.

## Why
When a manual shell command like `!sleep 10` is running, submitting
plain text such as `hi` currently sends that text as a steer for the
active shell turn. User shell turns are not steerable like model turns,
so the TUI can remain stuck in `Working` after the shell command
finishes.

## What Changed
- Detect when the only active work is one or more
`ExecCommandSource::UserShell` commands.
- Queue plain submitted input in that state so it drains after the shell
command and shell turn complete.
- Preserve `!cmd` submissions during running work so explicit shell
commands keep their existing behavior.
- Add regression coverage for the `!sleep 10` plus `hi` flow in
`chatwidget::tests::exec_flow::user_message_during_user_shell_command_is_queued_not_steered`.

## Verification
- Manually confirmed hang before the fix and no hang after the fix
2026-04-21 10:13:13 -07:00
Casey Chow
41652665f5 [codex] Add tmux-aware OSC 9 notifications (#17836)
## Summary
- wrap OSC 9 notifications in tmux's DCS passthrough so terminal
notifications make it through tmux
- use codex-terminal-detection for OSC 9 auto-selection so tmux sessions
inherit the underlying client terminal support
- add focused notification backend tests for plain OSC 9 and
tmux-wrapped output

## Stack
- base PR: #18479
- review order: #18479, then this PR

## Why
Tmux does not forward OSC 9 notifications directly; the sequence has to
be wrapped in tmux's DCS passthrough envelope. Codex also had local
notification heuristics that could miss supported terminals when running
under tmux, even though codex-terminal-detection already knows how to
attribute tmux sessions to the client terminal.

## Validation
- `just fmt`
- `cargo test -p codex-tui` *(currently blocked by an unrelated existing
compile error in `app-server/src/message_processor.rs:754` referencing
`connection_id` out of scope; not caused by this change)*

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 17:10:36 +00:00
1560 changed files with 181725 additions and 70124 deletions

View File

@@ -29,10 +29,13 @@ common:linux --test_env=PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
common:macos --test_env=PATH=/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
# Pass through some env vars Windows needs to use powershell?
common:windows --test_env=PATH
common:windows --test_env=SYSTEMROOT
common:windows --test_env=COMSPEC
common:windows --test_env=WINDIR
# Rust's libtest harness runs test bodies on std-spawned threads. The default
# 2 MiB stack can be too small for large async test futures on Windows CI; see
# https://github.com/openai/codex/pull/19067 for the motivating failure.
common --test_env=RUST_MIN_STACK=8388608 # 8 MiB
common --test_output=errors
common --bes_results_url=https://app.buildbuddy.io/invocation/

View File

@@ -1,5 +1,6 @@
iTerm
iTerm2
psuedo
SOM
te
TE

View File

@@ -1,6 +1,6 @@
[codespell]
# Ref: https://github.com/codespell-project/codespell#using-a-config-file
skip = .git*,vendor,*-lock.yaml,*.lock,.codespellrc,*test.ts,*.jsonl,frame*.txt,*.snap,*.snap.new,*meriyah.umd.min.js
skip = .git*,vendor,*-lock.yaml,*.lock,.codespellrc,*test.ts,*.jsonl,frame*.txt,*.snap,*.snap.new
check-hidden = true
ignore-regex = ^\s*"image/\S+": ".*|\b(afterAll)\b
ignore-words-list = ratatui,ser,iTerm,iterm2,iterm,te,TE,PASE,SEH

View File

@@ -0,0 +1,11 @@
# THIS IS AUTOGENERATED. DO NOT EDIT MANUALLY
version = 1
name = "codex"
[setup]
script = ""
[[actions]]
name = "Run"
icon = "run"
command = "cargo +1.93.0 run --manifest-path=codex-rs/Cargo.toml --bin codex -- -c mcp_oauth_credentials_store=file"

View File

@@ -27,10 +27,10 @@ Accept any of the following:
2. Run the watcher script to snapshot PR/review/CI state (or consume each streamed snapshot from `--watch`).
3. Inspect the `actions` list in the JSON response.
4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
5. If the failure is likely caused by the current branch, patch code locally, commit, and push. Do not patch random flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are unrelated to the branch.
6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
7. If a review item is actionable and correct, patch code locally, commit, push, and then mark the associated review thread/comment as resolved once the fix is on GitHub.
8. If a review item from another author is non-actionable, already addressed, or not valid, post one reply on the comment/thread explaining that decision (for example answering the question or explaining why no change is needed). Prefix the GitHub reply body with `[codex]` so it is clear the response is automated. If the watcher later surfaces your own reply, treat that self-authored item as already handled and do not reply again.
8. Do not post replies to human-authored review comments/threads unless the user explicitly confirms the exact response. If a human review item is non-actionable, already addressed, or not valid, surface the item and recommended response to the user instead of replying on GitHub.
9. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
10. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
11. On every loop, look for newly surfaced review feedback before acting on CI failures or mergeability state, then verify mergeability / merge-conflict status (for example via `gh pr view`) alongside CI.
@@ -69,12 +69,18 @@ python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --o
Use `gh` commands to inspect failed runs before deciding to rerun.
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh run view <run-id> --log-failed`
- `gh api repos/<owner>/<repo>/actions/runs/<run-id>/jobs -X GET -f per_page=100`
- `gh api repos/<owner>/<repo>/actions/jobs/<job-id>/logs > /tmp/codex-gh-job-<job-id>-logs.zip`
- `gh run view <run-id> --log-failed` as a fallback after the overall workflow run is complete
Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
`gh run view --log-failed` is workflow-run scoped and may not expose failed-job logs until the overall run finishes. For faster diagnosis, poll the run's jobs first and, as soon as a specific job has failed, fetch that job's logs directly from the Actions job logs endpoint. The watcher includes a `failed_jobs` list with each failed job's `job_id` and `logs_endpoint` when GitHub exposes one.
Prefer treating failures as branch-related when failed-job logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
Do not attempt to fix flaky/unrelated failures by changing tests, build scripts, CI configuration, dependency pins, or infrastructure-adjacent code unless the logs clearly connect the failure to the PR branch. For flaky/unrelated failures, rerun only when the watcher recommends `retry_failed_checks`; otherwise wait or stop for user help.
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
Read `.codex/skills/babysit-pr/references/heuristics.md` for a concise checklist.
@@ -99,7 +105,8 @@ When you agree with a comment and it is actionable:
5. Resume watching on the new SHA immediately (do not stop after reporting the push).
6. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
If you disagree or the comment is non-actionable/already addressed, reply once directly on the GitHub comment/thread so the reviewer gets an explicit answer, then continue the watcher loop. Prefix any GitHub reply to a code review comment/thread with `[codex]` so it is clear the response is automated and not from the human user. If the watcher later surfaces your own reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
Do not post replies to human-authored GitHub review comments/threads automatically. If you disagree with a human comment, believe it is non-actionable/already addressed, or need to answer a question, report the item to the user with a suggested response and wait for explicit confirmation before posting anything on GitHub. If the user approves a response, prefix it with `[codex]` so it is clear the response is automated and not from the human user.
If the watcher later surfaces your own approved reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
## Git Safety Rules
@@ -125,11 +132,11 @@ Use this loop in a live Codex session:
2. Read `actions`.
3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
4. Check CI summary, new review items, and mergeability/conflict status.
5. Diagnose CI failures and classify branch-related vs flaky/unrelated.
6. For each surfaced review item from another author, either reply once with an explanation if it is non-actionable or patch/commit/push and then resolve it if it is actionable. If a later snapshot surfaces your own reply, treat it as informational and continue without responding again.
5. Diagnose CI failures and classify branch-related vs flaky/unrelated. If the overall run is still pending but `failed_jobs` already includes a failed job, fetch that job's logs and diagnose immediately instead of waiting for the whole workflow run to finish. Patch only when the failure is branch-related.
6. For each surfaced review item from another author, patch/commit/push and then resolve it if it is actionable. If it is non-actionable, already addressed, or requires a written answer, surface it to the user with a suggested response instead of posting automatically. If a later snapshot surfaces your own approved reply, treat it as informational and continue without responding again.
7. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
8. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit.
9. If you pushed a commit, resolved a review thread, replied to a review comment, or triggered a rerun, report the action briefly and continue polling (do not stop).
8. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit. Do not make code changes for unrelated flakes or infrastructure failures just to get CI green.
9. If you pushed a commit, resolved a review thread, or triggered a rerun, report the action briefly and continue polling (do not stop). If a human review comment needs a written GitHub response, stop and ask for confirmation before posting.
10. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached.
11. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report that the PR is currently ready to merge but keep the watcher running so new review comments are surfaced quickly while the PR remains open.
12. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.

View File

@@ -1,4 +1,4 @@
interface:
display_name: "PR Babysitter"
short_description: "Watch PR review comments, CI, and merge conflicts"
default_prompt: "Babysit the current PR: monitor reviewer comments, CI, and merge-conflict status (prefer the watchers --watch mode for live monitoring); surface new review feedback before acting on CI or mergeability work, fix valid issues, push updates, and rerun flaky failures up to 3 times. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Do not treat a green + mergeable PR as a terminal stop while it is still open; continue polling autonomously after any push/rerun so newly posted review comments are surfaced until a strict terminal stop condition is reached or the user interrupts."
default_prompt: "Babysit the current PR: monitor reviewer comments, CI, and merge-conflict status (prefer the watchers --watch mode for live monitoring); surface new review feedback before acting on CI or mergeability work, fix valid issues, push updates, and rerun flaky failures up to 3 times. Do not post replies to human-authored review comments unless the user explicitly confirms the exact response. Do not patch unrelated flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are not caused by the branch. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Do not treat a green + mergeable PR as a terminal stop while it is still open; continue polling autonomously after any push/rerun so newly posted review comments are surfaced until a strict terminal stop condition is reached or the user interrupts."

View File

@@ -23,9 +23,11 @@ Used to discover failed workflow runs and rerunnable run IDs.
### Failed log inspection
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh api repos/{owner}/{repo}/actions/runs/{run_id}/jobs -X GET -f per_page=100`
- `gh api repos/{owner}/{repo}/actions/jobs/{job_id}/logs > /tmp/codex-gh-job-{job_id}-logs.zip`
- `gh run view <run-id> --log-failed`
Used by Codex to classify branch-related vs flaky/unrelated failures.
Used by Codex to classify branch-related vs flaky/unrelated failures. Prefer the direct job log endpoint as soon as a job has failed because `gh run view --log-failed` may not produce failed-job logs until the overall workflow run completes.
### Retry failed jobs only
@@ -70,3 +72,11 @@ Reruns only failed jobs (and dependencies) for a workflow run.
- `conclusion`
- `html_url`
- `head_sha`
### Actions run jobs API (`jobs[]`)
- `id`
- `name`
- `status`
- `conclusion`
- `html_url`

View File

@@ -18,6 +18,8 @@ Treat as **likely flaky or unrelated** when evidence points to transient or exte
- Cloud/service rate limits or transient API outages
- Non-deterministic failures in unrelated integration tests with known flake patterns
Do not patch likely flaky/unrelated failures. Use the retry budget for rerunnable failures, wait for pending jobs, or stop and report the blocker when the failure is persistent or infrastructure-owned.
If uncertain, inspect failed logs once before choosing rerun.
## Decision tree (fix vs rerun vs stop)
@@ -25,9 +27,11 @@ If uncertain, inspect failed logs once before choosing rerun.
1. If PR is merged/closed: stop.
2. If there are failed checks:
- Diagnose first.
- If checks are still pending but an individual job has already failed: fetch that job's logs and diagnose now.
- If branch-related: fix locally, commit, push.
- If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs.
- If checks are still pending: wait.
- If likely flaky/unrelated and not safely rerunnable: stop and report the blocker; do not edit unrelated tests, build scripts, CI configuration, dependency pins, or infrastructure code.
- If checks are still pending and no failed job is available yet: wait.
3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure.
4. Independently, process any new human review comments.
@@ -40,12 +44,15 @@ Address the comment when:
- The requested change does not conflict with the users intent or recent guidance.
- The change can be made safely without unrelated refactors.
Fix valid human review feedback in code when possible, but do not post a GitHub reply to a human-authored comment/thread unless the user explicitly confirms the exact response.
Do not auto-fix when:
- The comment is ambiguous and needs clarification.
- The request conflicts with explicit user instructions.
- The proposed change requires product/design decisions the user has not made.
- The codebase is in a dirty/unrelated state that makes safe editing uncertain.
- The comment only needs a written answer or disagreement response; propose the reply to the user instead of posting it automatically.
## Stop-and-ask conditions
@@ -56,3 +63,4 @@ Stop and ask the user instead of continuing automatically when:
- The PR branch cannot be pushed.
- CI failures persist after the flaky retry budget.
- Reviewer feedback requires a product decision or cross-team coordination.
- A human review comment requires a written GitHub reply instead of a code change.

View File

@@ -338,6 +338,66 @@ def failed_runs_from_workflow_runs(runs, head_sha):
return failed_runs
def get_jobs_for_run(repo, run_id):
endpoint = f"repos/{repo}/actions/runs/{run_id}/jobs"
data = gh_json(["api", endpoint, "-X", "GET", "-f", "per_page=100"], repo=repo)
if not isinstance(data, dict):
raise GhCommandError("Unexpected payload from actions run jobs API")
jobs = data.get("jobs") or []
if not isinstance(jobs, list):
raise GhCommandError("Expected `jobs` to be a list")
return jobs
def failed_jobs_from_workflow_runs(repo, runs, head_sha):
failed_jobs = []
for run in runs:
if not isinstance(run, dict):
continue
if str(run.get("head_sha") or "") != head_sha:
continue
run_id = run.get("id")
if run_id in (None, ""):
continue
run_status = str(run.get("status") or "")
run_conclusion = str(run.get("conclusion") or "")
if run_status.lower() == "completed" and run_conclusion not in FAILED_RUN_CONCLUSIONS:
continue
jobs = get_jobs_for_run(repo, run_id)
for job in jobs:
if not isinstance(job, dict):
continue
conclusion = str(job.get("conclusion") or "")
if conclusion not in FAILED_RUN_CONCLUSIONS:
continue
job_id = job.get("id")
logs_endpoint = None
if job_id not in (None, ""):
logs_endpoint = f"repos/{repo}/actions/jobs/{job_id}/logs"
failed_jobs.append(
{
"run_id": run_id,
"workflow_name": run.get("name") or run.get("display_title") or "",
"run_status": run_status,
"run_conclusion": run_conclusion,
"job_id": job_id,
"job_name": str(job.get("name") or ""),
"status": str(job.get("status") or ""),
"conclusion": conclusion,
"html_url": str(job.get("html_url") or ""),
"logs_endpoint": logs_endpoint,
}
)
failed_jobs.sort(
key=lambda item: (
str(item.get("workflow_name") or ""),
str(item.get("job_name") or ""),
str(item.get("job_id") or ""),
)
)
return failed_jobs
def get_authenticated_login():
data = gh_json(["api", "user"])
if not isinstance(data, dict) or not data.get("login"):
@@ -568,7 +628,7 @@ def is_pr_ready_to_merge(pr, checks_summary, new_review_items):
return True
def recommend_actions(pr, checks_summary, failed_runs, new_review_items, retries_used, max_retries):
def recommend_actions(pr, checks_summary, failed_runs, failed_jobs, new_review_items, retries_used, max_retries):
actions = []
if pr["closed"] or pr["merged"]:
if new_review_items:
@@ -583,7 +643,7 @@ def recommend_actions(pr, checks_summary, failed_runs, new_review_items, retries
if new_review_items:
actions.append("process_review_comment")
has_failed_pr_checks = checks_summary["failed_count"] > 0
has_failed_pr_checks = checks_summary["failed_count"] > 0 or bool(failed_jobs)
if has_failed_pr_checks:
if checks_summary["all_terminal"] and retries_used >= max_retries:
actions.append("stop_exhausted_retries")
@@ -621,12 +681,14 @@ def collect_snapshot(args):
checks_summary = summarize_checks(checks)
workflow_runs = get_workflow_runs_for_sha(pr["repo"], pr["head_sha"])
failed_runs = failed_runs_from_workflow_runs(workflow_runs, pr["head_sha"])
failed_jobs = failed_jobs_from_workflow_runs(pr["repo"], workflow_runs, pr["head_sha"])
retries_used = current_retry_count(state, pr["head_sha"])
actions = recommend_actions(
pr,
checks_summary,
failed_runs,
failed_jobs,
new_review_items,
retries_used,
args.max_flaky_retries,
@@ -641,6 +703,7 @@ def collect_snapshot(args):
"pr": pr,
"checks": checks_summary,
"failed_runs": failed_runs,
"failed_jobs": failed_jobs,
"new_review_items": new_review_items,
"actions": actions,
"retry_state": {

View File

@@ -75,6 +75,11 @@ def test_collect_snapshot_fetches_review_items_before_ci(monkeypatch, tmp_path):
"failed_runs_from_workflow_runs",
lambda *args, **kwargs: call_order.append("failed_runs") or [],
)
monkeypatch.setattr(
gh_pr_watch,
"failed_jobs_from_workflow_runs",
lambda *args, **kwargs: call_order.append("failed_jobs") or [],
)
monkeypatch.setattr(
gh_pr_watch,
"recommend_actions",
@@ -100,6 +105,7 @@ def test_recommend_actions_prioritizes_review_comments():
sample_pr(),
sample_checks(failed_count=1),
[{"run_id": 99}],
[],
[{"kind": "review_comment", "id": "1"}],
0,
3,
@@ -119,6 +125,7 @@ def test_run_watch_keeps_polling_open_ready_to_merge_pr(monkeypatch):
"pr": sample_pr(),
"checks": sample_checks(),
"failed_runs": [],
"failed_jobs": [],
"new_review_items": [],
"actions": ["ready_to_merge"],
"retry_state": {
@@ -153,3 +160,58 @@ def test_run_watch_keeps_polling_open_ready_to_merge_pr(monkeypatch):
assert sleeps == [30, 30]
assert [event for event, _ in events] == ["snapshot", "snapshot"]
def test_failed_jobs_include_direct_logs_endpoint(monkeypatch):
jobs_by_run = {
99: [
{
"id": 555,
"name": "unit tests",
"status": "completed",
"conclusion": "failure",
"html_url": "https://github.com/openai/codex/actions/runs/99/job/555",
},
{
"id": 556,
"name": "lint",
"status": "completed",
"conclusion": "success",
},
]
}
monkeypatch.setattr(
gh_pr_watch,
"get_jobs_for_run",
lambda repo, run_id: jobs_by_run[run_id],
)
failed_jobs = gh_pr_watch.failed_jobs_from_workflow_runs(
"openai/codex",
[
{
"id": 99,
"name": "CI",
"status": "in_progress",
"conclusion": "",
"head_sha": "abc123",
}
],
"abc123",
)
assert failed_jobs == [
{
"run_id": 99,
"workflow_name": "CI",
"run_status": "in_progress",
"run_conclusion": "",
"job_id": 555,
"job_name": "unit tests",
"status": "completed",
"conclusion": "failure",
"html_url": "https://github.com/openai/codex/actions/runs/99/job/555",
"logs_endpoint": "repos/openai/codex/actions/jobs/555/logs",
}
]

View File

@@ -10,3 +10,4 @@ Codex maintains a context (history of messages) that is sent to the model in inf
3. No unbounded items - everything injected in the model context must have a bounded size and a hard cap.
4. No items larger than 10K tokens.
5. Highlight new individual items that can cross >1k tokens as P0. These need an additional manual review.
6. All injected fragments must be defined as structs in `core/context` and implement ContextualUserFragment trait

View File

@@ -0,0 +1,127 @@
---
name: codex-issue-digest
description: Run a GitHub issue digest for openai/codex by feature-area labels, all areas, and configurable time windows. Use when asked to summarize recent Codex bug reports or enhancement requests, especially for owner-specific labels such as tui, exec, app, or similar areas.
---
# Codex Issue Digest
## Objective
Produce a headline-first, insight-oriented digest of `openai/codex` issues for the requested feature-area labels over the previous 24 hours by default. Honor a different duration when the user asks for one, for example "past week" or "48 hours". Default to a summary-only response; include details only when requested.
Include only issues that currently have `bug` or `enhancement` plus at least one requested owner label. If the user asks for all areas or all labels, collect `bug`/`enhancement` issues across all labels.
## Inputs
- Feature-area labels, for example `tui exec`
- `all areas` / `all labels` to scan all current feature labels
- Optional repo override, default `openai/codex`
- Optional time window, default previous 24 hours; examples: `48h`, `7d`, `1w`, `past week`
## Workflow
1. Run the collector from a current Codex repo checkout:
```bash
python3 .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py --labels tui exec --window-hours 24
```
Use `--window "past week"` or `--window-hours 168` when the user asks for a non-default duration. Use `--all-labels` when the user says all areas or all labels.
2. Use the JSON as the source of truth. It includes new issues, new issue comments, new reactions/upvotes, current labels, current reaction counts, model-ready `summary_inputs`, and detailed `digest_rows`.
3. Choose the output mode from the user's request:
- Default mode: start the report with `## Summary` and do not emit `## Details`.
- Details-upfront mode: if the user asks for details, a table, a full digest, "include details", or similar, start with `## Summary`, then include `## Details`.
- Follow-up details mode: if the user asks for more detail after a summary-only digest, produce `## Details` from the existing collector JSON when it is still available; otherwise rerun the collector.
4. In `## Summary`, write a headline-first executive summary:
- The first nonblank line under `## Summary` must be a single-line headline or judgment, not a bullet. It should be useful even if the reader stops there.
- On quiet days, prefer exactly: `No major issues reported by users.` Use this when there are no elevated rows, no newly repeated theme, and nothing that needs owner action.
- When users are surfacing notable issues, make the headline name the count or theme, for example `Two issues are being surfaced by users:`.
- Immediately under an active headline, list only the issues or themes driving attention, ordered by importance. Start each line with the row's `attention_marker` when present, then a concise owner-readable description and inline issue refs.
- Treat `🔥🔥` as headline-worthy and `🔥` as elevated. Do not add fire emoji yourself; only copy the row's `attention_marker`.
- Keep any extra summary detail after the headline to 1-3 terse lines, only when it adds a decision-relevant caveat, repeated theme, or owner action.
- Do not include routine counts, broad stats, or low-signal table summaries in `## Summary` unless they change the headline. Put metadata and optional counts in `## Details` or the footer.
- In default mode, end the report with a concise prompt such as `Want details? I can expand this into the issue table.` Keep this separate from the summary headline so the headline stays clean.
- Cluster and name themes yourself from `summary_inputs`; the collector intentionally does not hard-code issue categories.
- Use a cluster only when the issues genuinely share the same product problem. If several issues merely share a broad platform or label, describe them individually.
- Do not omit a repeated theme just because its individual issues fall below the details table cutoff. Several similar reports should be called out as a repeated customer concern.
- For single-issue rows, summarize the concern directly instead of calling it a cluster.
- Use inline numbered issue links from each relevant row's `ref_markdown`.
- Example quiet summary:
```markdown
## Summary
No major issues reported by users.
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
- Example active summary:
```markdown
## Summary
Two issues are being surfaced by users:
🔥🔥 Terminal launch hangs on startup [1](https://github.com/openai/codex/issues/123)
🔥 Resume switches model providers unexpectedly [2](https://github.com/openai/codex/issues/456)
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
5. In `## Details`, when details are requested, include a compact table only when useful:
- Prefer rows from `digest_rows`; include a `Refs` column using each row's `ref_markdown`.
- Keep the table short; omit low-signal rows when the summary already covers them.
- Use compact columns such as marker, area, type, description, interactions, and refs.
- The `Description` cell should be a short owner-readable phrase. Use row `description`, title, body excerpts, and recent comments, but do not mechanically copy the raw GitHub issue title when it contains incidental details.
- A clear quiet/no-concern sentence when there is no meaningful signal.
6. Use the JSON `attention_marker` exactly. It is empty for normal rows, `🔥` for elevated rows, and `🔥🔥` for very high-attention rows. The actual cutoffs are in `attention_thresholds`.
7. Use inline numbered references where a row or bullet points to issues, for example `Compaction bugs [1](https://github.com/openai/codex/issues/123), [2](https://github.com/openai/codex/issues/456)`. Do not add a separate footnotes section.
8. Label `interactions` as `Interactions`; it counts posts/comments/reactions during the requested window, not unique people.
9. Mention the collector `script_version`, repo checkout `git_head`, and time window in one compact source line. In default mode, put this before the details prompt so the final line still asks whether the user wants details. In details-upfront mode, it can be the footer.
## Reaction Handling
The collector uses GitHub reactions endpoints, which include `created_at`, to count reactions created during the digest window for hydrated issues. It reports both in-window reaction counts and current reaction totals. Treat current reaction totals as standing engagement, and treat `new_reactions` / `new_upvotes` as windowed activity.
By default, the collector fetches issue comments with `since=<window start>` and caps the number of comment pages per issue. This keeps very long historical threads from dominating a digest run and focuses the report on recent posts. Use `--fetch-all-comments` only when exhaustive comment history is more important than runtime.
GitHub issue search is still seeded by issue `updated_at`, so a purely reaction-only issue may be missed if reactions do not bump `updated_at`. Covering every reaction-only case would require either a persisted snapshot store or a broader scan of labeled issues.
## Attention Markers
The collector scales attention markers by the requested time window. The baseline is 5 human user interactions for `🔥` and 10 for `🔥🔥` over 24 hours; longer or shorter windows scale those cutoffs linearly and round up. For example, a one-week report uses 35 and 70 interactions. Human user interactions are human-authored new issue posts, human-authored new comments, and human reactions created during the window, including upvotes. Bot posts and bot reactions are excluded. In prose, explain this as high user interaction rather than naming the emoji.
## Freshness
The automation should run from a repo checkout that contains this skill. For shared daily use, prefer one of these patterns:
- Run the automation in a checkout that is refreshed before the automation starts, for example with `git pull --ff-only`.
- If the automation cannot safely mutate the checkout, have it report the current `git_head` from the collector output so readers know which skill/script version produced the digest.
## Sample Owner Prompt
```text
Use $codex-issue-digest to run the Codex issue digest for labels tui and exec over the previous 24 hours.
```
```text
Use $codex-issue-digest to run the Codex issue digest for all areas over the past week.
```
## Validation
Dry run the collector against recent issues:
```bash
python3 .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py --labels tui exec --window-hours 24
```
```bash
python3 .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py --all-labels --window "past week" --limit-issues 10
```
Run the focused script tests:
```bash
pytest .codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py
```

View File

@@ -0,0 +1,4 @@
interface:
display_name: "Codex Issue Digest"
short_description: "Summarize Codex issues by labels or all areas"
default_prompt: "Use $codex-issue-digest to run the Codex issue digest for labels tui and exec over the previous 24 hours."

View File

@@ -0,0 +1,994 @@
#!/usr/bin/env python3
"""Collect recent openai/codex issue activity for owner-focused digests."""
import argparse
import json
import math
import re
import subprocess
import sys
from datetime import datetime, timedelta, timezone
from pathlib import Path
from urllib.parse import quote
SCRIPT_VERSION = 4
QUALIFYING_KIND_LABELS = ("bug", "enhancement")
REACTION_KEYS = ("+1", "-1", "laugh", "hooray", "confused", "heart", "rocket", "eyes")
BASE_ATTENTION_WINDOW_HOURS = 24.0
ONE_ATTENTION_INTERACTION_THRESHOLD = 5
TWO_ATTENTION_INTERACTION_THRESHOLD = 10
ALL_LABEL_PHRASES = {"all", "all areas", "all labels", "all-areas", "all-labels", "*"}
class GhCommandError(RuntimeError):
pass
def parse_args():
parser = argparse.ArgumentParser(
description="Collect recent GitHub issue activity for a Codex owner digest."
)
parser.add_argument(
"--repo", default="openai/codex", help="OWNER/REPO, default openai/codex"
)
parser.add_argument(
"--labels",
nargs="+",
default=[],
help="Feature-area labels owned by the digest recipient, for example: tui exec",
)
parser.add_argument(
"--all-labels",
action="store_true",
help="Collect bug/enhancement issues across all feature-area labels",
)
parser.add_argument(
"--window",
help='Lookback duration such as "24h", "7d", "1w", or "past week"',
)
parser.add_argument(
"--window-hours", type=float, default=24.0, help="Lookback window"
)
parser.add_argument(
"--since", help="UTC ISO timestamp override for the window start"
)
parser.add_argument("--until", help="UTC ISO timestamp override for the window end")
parser.add_argument(
"--limit-issues",
type=int,
default=200,
help="Maximum candidate issues to hydrate after search",
)
parser.add_argument(
"--body-chars", type=int, default=1200, help="Issue body excerpt length"
)
parser.add_argument(
"--comment-chars", type=int, default=900, help="Comment excerpt length"
)
parser.add_argument(
"--max-comment-pages",
type=int,
default=3,
help=(
"Maximum pages of issue comments to hydrate per issue after applying the "
"window filter. Use 0 with --fetch-all-comments for no page cap."
),
)
parser.add_argument(
"--fetch-all-comments",
action="store_true",
help="Hydrate complete issue comment histories instead of only window-updated comments.",
)
return parser.parse_args()
def parse_timestamp(value, arg_name):
if value is None:
return None
normalized = value.strip()
if not normalized:
return None
if normalized.endswith("Z"):
normalized = f"{normalized[:-1]}+00:00"
try:
parsed = datetime.fromisoformat(normalized)
except ValueError as err:
raise ValueError(f"{arg_name} must be an ISO timestamp") from err
if parsed.tzinfo is None:
parsed = parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
def format_timestamp(value):
return (
value.astimezone(timezone.utc)
.replace(microsecond=0)
.isoformat()
.replace("+00:00", "Z")
)
def resolve_window(args):
until = parse_timestamp(args.until, "--until") or datetime.now(timezone.utc)
since = parse_timestamp(args.since, "--since")
if since is None:
hours = parse_duration_hours(getattr(args, "window", None))
if hours is None:
hours = getattr(args, "window_hours", 24.0)
if hours <= 0:
raise ValueError("window duration must be > 0")
since = until - timedelta(hours=hours)
if since >= until:
raise ValueError("--since must be before --until")
return since, until
def parse_duration_hours(value):
if value is None:
return None
text = value.strip().casefold().replace("_", " ")
if not text:
return None
text = re.sub(r"^(past|last)\s+", "", text)
aliases = {
"day": 24.0,
"24h": 24.0,
"week": 168.0,
"7d": 168.0,
}
if text in aliases:
return aliases[text]
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(h|hr|hrs|hour|hours)", text)
if match:
return float(match.group(1))
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(d|day|days)", text)
if match:
return float(match.group(1)) * 24.0
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(w|week|weeks)", text)
if match:
return float(match.group(1)) * 168.0
raise ValueError(f"Unsupported duration: {value}")
def normalize_requested_labels(labels, all_labels=False):
out = []
seen = set()
for raw in labels:
for piece in raw.split(","):
label = piece.strip()
if not label:
continue
key = label.casefold()
if key not in seen:
out.append(label)
seen.add(key)
phrase = " ".join(label.casefold() for label in out)
if all_labels or phrase in ALL_LABEL_PHRASES:
return [], True
if not out:
raise ValueError(
"At least one feature-area label is required, or use --all-labels"
)
return out, False
def quote_label(label):
if re.fullmatch(r"[A-Za-z0-9_.:-]+", label):
return f"label:{label}"
escaped = label.replace('"', '\\"')
return f'label:"{escaped}"'
def build_search_queries(
repo, owner_labels, since, kind_labels=QUALIFYING_KIND_LABELS, all_labels=False
):
since_date = since.date().isoformat()
queries = []
if all_labels:
for kind_label in kind_labels:
queries.append(
" ".join(
[
f"repo:{repo}",
"is:issue",
f"updated:>={since_date}",
quote_label(kind_label),
]
)
)
return queries
for owner_label in owner_labels:
for kind_label in kind_labels:
queries.append(
" ".join(
[
f"repo:{repo}",
"is:issue",
f"updated:>={since_date}",
quote_label(owner_label),
quote_label(kind_label),
]
)
)
return queries
def _format_gh_error(cmd, err):
stdout = (err.stdout or "").strip()
stderr = (err.stderr or "").strip()
parts = [f"GitHub CLI command failed: {' '.join(cmd)}"]
if stdout:
parts.append(f"stdout: {stdout}")
if stderr:
parts.append(f"stderr: {stderr}")
return "\n".join(parts)
def gh_json(args):
cmd = ["gh", *args]
try:
proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
except FileNotFoundError as err:
raise GhCommandError("`gh` command not found") from err
except subprocess.CalledProcessError as err:
raise GhCommandError(_format_gh_error(cmd, err)) from err
raw = proc.stdout.strip()
if not raw:
return None
try:
return json.loads(raw)
except json.JSONDecodeError as err:
raise GhCommandError(
f"Failed to parse JSON from gh output for {' '.join(args)}"
) from err
def gh_text(args):
cmd = ["gh", *args]
try:
proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
except (FileNotFoundError, subprocess.CalledProcessError):
return ""
return proc.stdout.strip()
def git_head():
try:
proc = subprocess.run(
["git", "rev-parse", "--short=12", "HEAD"],
check=True,
capture_output=True,
text=True,
)
except (FileNotFoundError, subprocess.CalledProcessError):
return None
return proc.stdout.strip() or None
def skill_relative_path():
try:
return str(Path(__file__).resolve().relative_to(Path.cwd().resolve()))
except ValueError:
return str(Path(__file__).resolve())
def gh_api_list_paginated(endpoint, per_page=100, max_pages=None, with_metadata=False):
items = []
page = 1
truncated = False
while True:
sep = "&" if "?" in endpoint else "?"
page_endpoint = f"{endpoint}{sep}per_page={per_page}&page={page}"
payload = gh_json(["api", page_endpoint])
if payload is None:
break
if not isinstance(payload, list):
raise GhCommandError(f"Unexpected paginated payload from gh api {endpoint}")
items.extend(payload)
if len(payload) < per_page:
break
if max_pages is not None and page >= max_pages:
truncated = True
break
page += 1
if with_metadata:
return {
"items": items,
"truncated": truncated,
"pages": page,
"max_pages": max_pages,
}
return items
def search_issue_numbers(queries, limit):
numbers = {}
for query in queries:
page = 1
seen_for_query = 0
while True:
payload = gh_json(
[
"api",
"search/issues",
"-X",
"GET",
"-f",
f"q={query}",
"-f",
"sort=updated",
"-f",
"order=desc",
"-f",
"per_page=100",
"-f",
f"page={page}",
]
)
if not isinstance(payload, dict):
raise GhCommandError("Unexpected payload from GitHub issue search")
items = payload.get("items") or []
if not isinstance(items, list):
raise GhCommandError("Expected search `items` to be a list")
for item in items:
if not isinstance(item, dict):
continue
number = item.get("number")
if isinstance(number, int):
numbers[number] = str(item.get("updated_at") or "")
seen_for_query += 1
if len(items) < 100 or seen_for_query >= limit:
break
page += 1
ordered = sorted(
numbers, key=lambda number: (numbers[number], number), reverse=True
)
return ordered[:limit]
def fetch_issue(repo, number):
payload = gh_json(["api", f"repos/{repo}/issues/{number}"])
if not isinstance(payload, dict):
raise GhCommandError(f"Unexpected issue payload for #{number}")
return payload
def fetch_comments(repo, number, since=None, max_pages=None):
endpoint = f"repos/{repo}/issues/{number}/comments"
if since is not None:
endpoint = f"{endpoint}?since={quote(format_timestamp(since), safe='')}"
return gh_api_list_paginated(
endpoint,
max_pages=max_pages,
with_metadata=True,
)
def fetch_reactions_for_item(endpoint, item):
if reaction_summary(item)["total"] <= 0:
return []
return gh_api_list_paginated(endpoint)
def fetch_comment_reactions(repo, comments):
reactions_by_comment_id = {}
for comment in comments:
comment_id = comment.get("id")
if comment_id in (None, ""):
continue
endpoint = f"repos/{repo}/issues/comments/{comment_id}/reactions"
reactions_by_comment_id[comment_id] = fetch_reactions_for_item(
endpoint, comment
)
return reactions_by_comment_id
def extract_login(user_obj):
if isinstance(user_obj, dict):
return str(user_obj.get("login") or "")
return ""
def is_bot_login(login):
return bool(login) and login.lower().endswith("[bot]")
def is_human_user(user_obj):
login = extract_login(user_obj)
return bool(login) and not is_bot_login(login)
def label_names(issue):
labels = []
for label in issue.get("labels") or []:
if isinstance(label, dict) and label.get("name"):
labels.append(str(label["name"]))
return sorted(labels, key=str.casefold)
def matching_labels(labels, requested):
labels_by_key = {label.casefold(): label for label in labels}
return [label for label in requested if label.casefold() in labels_by_key]
def area_labels(labels):
kind_keys = {label.casefold() for label in QUALIFYING_KIND_LABELS}
return [label for label in labels if label.casefold() not in kind_keys]
def attention_thresholds_for_window(window_hours):
if window_hours <= 0:
raise ValueError("window_hours must be > 0")
window_hours = round(window_hours, 6)
scale = window_hours / BASE_ATTENTION_WINDOW_HOURS
elevated = max(1, math.ceil(ONE_ATTENTION_INTERACTION_THRESHOLD * scale))
very_high = max(
elevated + 1, math.ceil(TWO_ATTENTION_INTERACTION_THRESHOLD * scale)
)
return {
"base_window_hours": BASE_ATTENTION_WINDOW_HOURS,
"window_hours": round(window_hours, 3),
"scale": round(scale, 3),
"elevated": elevated,
"very_high": very_high,
}
def attention_level_for(user_interactions, attention_thresholds=None):
thresholds = attention_thresholds or attention_thresholds_for_window(
BASE_ATTENTION_WINDOW_HOURS
)
if user_interactions >= thresholds["very_high"]:
return 2
if user_interactions >= thresholds["elevated"]:
return 1
return 0
def attention_marker_for(user_interactions, attention_thresholds=None):
return "🔥" * attention_level_for(user_interactions, attention_thresholds)
def reaction_summary(item):
reactions = item.get("reactions")
if not isinstance(reactions, dict):
return {"total": 0, "counts": {}}
counts = {}
for key in REACTION_KEYS:
value = reactions.get(key, 0)
if isinstance(value, int) and value:
counts[key] = value
total = reactions.get("total_count")
if not isinstance(total, int):
total = sum(counts.values())
return {"total": total, "counts": counts}
def reaction_event_summary(reactions, since, until):
counts = {}
total = 0
for reaction in reactions or []:
if not isinstance(reaction, dict):
continue
if not is_in_window(str(reaction.get("created_at") or ""), since, until):
continue
if not is_human_user(reaction.get("user")):
continue
content = str(reaction.get("content") or "")
if not content:
continue
counts[content] = counts.get(content, 0) + 1
total += 1
return {
"total": total,
"counts": counts,
"upvotes": counts.get("+1", 0),
}
def compact_text(value, limit):
text = re.sub(r"\s+", " ", str(value or "")).strip()
if limit <= 0:
return ""
if len(text) <= limit:
return text
return f"{text[: max(limit - 1, 0)].rstrip()}..."
def clean_title_for_description(title):
cleaned = re.sub(r"\s+", " ", str(title or "")).strip()
cleaned = re.sub(
r"^(codex(?: desktop| app|\.app| cli)?|desktop|windows codex app)\s*[:,-]\s*",
"",
cleaned,
flags=re.IGNORECASE,
)
cleaned = re.sub(r"^on windows,\s*", "Windows: ", cleaned, flags=re.IGNORECASE)
cleaned = cleaned.strip(" -:;")
return compact_text(cleaned, 80) or "Issue needs owner review"
def issue_description(issue):
return clean_title_for_description(issue.get("title"))
def is_in_window(timestamp, since, until):
parsed = parse_timestamp(timestamp, "timestamp")
if parsed is None:
return False
return since <= parsed < until
def summarize_comment(
comment, comment_chars, reaction_events=None, since=None, until=None
):
reactions = reaction_summary(comment)
new_reactions = (
reaction_event_summary(reaction_events, since, until)
if since is not None and until is not None
else {"total": 0, "counts": {}, "upvotes": 0}
)
human_user_interaction = is_human_user(comment.get("user"))
return {
"id": comment.get("id"),
"author": extract_login(comment.get("user")),
"author_association": str(comment.get("author_association") or ""),
"created_at": str(comment.get("created_at") or ""),
"updated_at": str(comment.get("updated_at") or ""),
"url": str(comment.get("html_url") or ""),
"human_user_interaction": human_user_interaction,
"reactions": reactions["counts"],
"reaction_total": reactions["total"],
"new_reactions": new_reactions["total"],
"new_upvotes": new_reactions["upvotes"],
"new_reaction_counts": new_reactions["counts"],
"body_excerpt": compact_text(comment.get("body"), comment_chars),
}
def summarize_issue(
issue,
comments,
requested_labels,
since,
until,
body_chars,
comment_chars,
issue_reaction_events=None,
comment_reactions_by_id=None,
all_labels=False,
comments_hydration=None,
attention_thresholds=None,
):
labels = label_names(issue)
labels_by_key = {label.casefold() for label in labels}
kind_labels = [
label for label in QUALIFYING_KIND_LABELS if label.casefold() in labels_by_key
]
if all_labels:
owner_labels = area_labels(labels) or ["unlabeled"]
else:
owner_labels = matching_labels(labels, requested_labels)
if not kind_labels or not owner_labels:
return None
updated_at = str(issue.get("updated_at") or "")
if not is_in_window(updated_at, since, until):
return None
new_issue = is_in_window(str(issue.get("created_at") or ""), since, until)
comment_reactions_by_id = comment_reactions_by_id or {}
new_comments = [
summarize_comment(
comment,
comment_chars,
reaction_events=comment_reactions_by_id.get(comment.get("id")),
since=since,
until=until,
)
for comment in comments
if is_in_window(str(comment.get("created_at") or ""), since, until)
]
new_comments.sort(key=lambda item: (item["created_at"], str(item["id"])))
issue_reactions = reaction_summary(issue)
issue_reaction_events_summary = reaction_event_summary(
issue_reaction_events, since, until
)
comment_reaction_events_summary = reaction_event_summary(
[
reaction
for reactions in comment_reactions_by_id.values()
for reaction in reactions
],
since,
until,
)
new_reactions = (
issue_reaction_events_summary["total"]
+ comment_reaction_events_summary["total"]
)
new_upvotes = (
issue_reaction_events_summary["upvotes"]
+ comment_reaction_events_summary["upvotes"]
)
all_comment_reaction_total = sum(
reaction_summary(comment)["total"] for comment in comments
)
new_comment_reaction_total = sum(
comment["reaction_total"] for comment in new_comments
)
new_issue_user_interaction = new_issue and is_human_user(issue.get("user"))
new_comment_user_interactions = sum(
1 for comment in new_comments if comment["human_user_interaction"]
)
user_interactions = (
int(new_issue_user_interaction) + new_comment_user_interactions + new_reactions
)
attention_level = attention_level_for(user_interactions, attention_thresholds)
attention_marker = attention_marker_for(user_interactions, attention_thresholds)
updated_without_visible_new_post = (
not new_issue and not new_comments and new_reactions == 0
)
engagement_score = (
len(new_comments) * 3
+ new_reactions
+ issue_reactions["total"]
+ new_comment_reaction_total
+ min(int(issue.get("comments") or len(comments) or 0), 10)
)
return {
"number": issue.get("number"),
"title": str(issue.get("title") or ""),
"description": issue_description(issue),
"url": str(issue.get("html_url") or ""),
"state": str(issue.get("state") or ""),
"author": extract_login(issue.get("user")),
"author_association": str(issue.get("author_association") or ""),
"created_at": str(issue.get("created_at") or ""),
"updated_at": updated_at,
"labels": labels,
"kind_labels": kind_labels,
"owner_labels": owner_labels,
"comments_total": int(issue.get("comments") or len(comments) or 0),
"comments_hydration": comments_hydration
or {
"fetched": len(comments),
"since": None,
"truncated": False,
"max_pages": None,
},
"issue_reactions": issue_reactions["counts"],
"issue_reaction_total": issue_reactions["total"],
"comment_reaction_total": all_comment_reaction_total,
"new_comment_reaction_total": new_comment_reaction_total,
"new_issue_reactions": issue_reaction_events_summary["total"],
"new_issue_upvotes": issue_reaction_events_summary["upvotes"],
"new_comment_reactions": comment_reaction_events_summary["total"],
"new_comment_upvotes": comment_reaction_events_summary["upvotes"],
"new_reactions": new_reactions,
"new_upvotes": new_upvotes,
"user_interactions": user_interactions,
"attention": attention_level > 0,
"attention_level": attention_level,
"attention_marker": attention_marker,
"engagement_score": engagement_score,
"activity": {
"new_issue": new_issue,
"new_comments": len(new_comments),
"new_human_comments": new_comment_user_interactions,
"new_reactions": new_reactions,
"new_upvotes": new_upvotes,
"updated_without_visible_new_post": updated_without_visible_new_post,
},
"body_excerpt": compact_text(issue.get("body"), body_chars),
"new_comments": new_comments,
}
def count_by_label(issues, labels):
out = {}
for label in labels:
matching = [issue for issue in issues if label in issue["owner_labels"]]
out[label] = {
"issues": len(matching),
"new_issues": sum(
1 for issue in matching if issue["activity"]["new_issue"]
),
"new_comments": sum(
issue["activity"]["new_comments"] for issue in matching
),
}
return out
def count_by_kind(issues):
out = {}
for kind in QUALIFYING_KIND_LABELS:
matching = [issue for issue in issues if kind in issue["kind_labels"]]
out[kind] = {
"issues": len(matching),
"new_issues": sum(
1 for issue in matching if issue["activity"]["new_issue"]
),
"new_comments": sum(
issue["activity"]["new_comments"] for issue in matching
),
}
return out
def hot_items(issues, limit=8):
ranked = sorted(
issues,
key=lambda issue: (
issue["attention"],
issue["attention_level"],
issue["user_interactions"],
issue["engagement_score"],
issue["activity"]["new_comments"],
issue["issue_reaction_total"] + issue["comment_reaction_total"],
issue["updated_at"],
),
reverse=True,
)
return [
{
"number": issue["number"],
"title": issue["title"],
"url": issue["url"],
"owner_labels": issue["owner_labels"],
"kind_labels": issue["kind_labels"],
"attention": issue["attention"],
"attention_level": issue["attention_level"],
"attention_marker": issue["attention_marker"],
"user_interactions": issue["user_interactions"],
"new_reactions": issue["new_reactions"],
"new_upvotes": issue["new_upvotes"],
"engagement_score": issue["engagement_score"],
"new_comments": issue["activity"]["new_comments"],
"reaction_total": issue["issue_reaction_total"]
+ issue["comment_reaction_total"],
}
for issue in ranked[:limit]
if issue["engagement_score"] > 0
]
def ranked_digest_issues(issues):
return sorted(
issues,
key=lambda issue: (
issue["attention"],
issue["attention_level"],
issue["user_interactions"],
issue["engagement_score"],
issue["activity"]["new_comments"],
issue["updated_at"],
),
reverse=True,
)
def digest_rows(issues, limit=10, ref_map=None):
ranked = ranked_digest_issues(issues)
if ref_map is None:
ref_map = {issue["number"]: ref for ref, issue in enumerate(ranked, start=1)}
rows = []
for issue in ranked[:limit]:
ref = ref_map[issue["number"]]
reaction_total = issue["issue_reaction_total"] + issue["comment_reaction_total"]
rows.append(
{
"ref": ref,
"ref_markdown": f"[{ref}]({issue['url']})",
"marker": issue["attention_marker"],
"attention_marker": issue["attention_marker"],
"number": issue["number"],
"description": issue["description"],
"title": issue["title"],
"url": issue["url"],
"area": ", ".join(issue["owner_labels"]),
"kind": ", ".join(issue["kind_labels"]),
"state": issue["state"],
"interactions": issue["user_interactions"],
"user_interactions": issue["user_interactions"],
"new_reactions": issue["new_reactions"],
"new_upvotes": issue["new_upvotes"],
"current_reactions": reaction_total,
}
)
return rows
def issue_ref_markdown(issue, ref_map):
ref = ref_map[issue["number"]]
return f"[{ref}]({issue['url']})"
def summary_inputs(issues, limit=80, ref_map=None):
ranked = ranked_digest_issues(issues)
if ref_map is None:
ref_map = {issue["number"]: ref for ref, issue in enumerate(ranked, start=1)}
rows = []
for issue in ranked[:limit]:
rows.append(
{
"ref": ref_map[issue["number"]],
"ref_markdown": issue_ref_markdown(issue, ref_map),
"number": issue["number"],
"title": issue["title"],
"description": issue["description"],
"url": issue["url"],
"labels": issue["labels"],
"owner_labels": issue["owner_labels"],
"kind_labels": issue["kind_labels"],
"state": issue.get("state", ""),
"attention_marker": issue.get("attention_marker", ""),
"interactions": issue["user_interactions"],
"new_comments": issue["activity"].get("new_comments", 0),
"new_reactions": issue.get("new_reactions", 0),
"new_upvotes": issue.get("new_upvotes", 0),
"current_reactions": issue.get("issue_reaction_total", 0)
+ issue.get("comment_reaction_total", 0),
}
)
return rows
def collect_digest(args):
since, until = resolve_window(args)
window_hours = (until - since).total_seconds() / 3600
attention_thresholds = attention_thresholds_for_window(window_hours)
requested_labels, all_labels = normalize_requested_labels(
args.labels, all_labels=args.all_labels
)
queries = build_search_queries(
args.repo, requested_labels, since, all_labels=all_labels
)
numbers = search_issue_numbers(queries, args.limit_issues)
gh_version_output = gh_text(["--version"])
issues = []
max_comment_pages = None if args.max_comment_pages <= 0 else args.max_comment_pages
for number in numbers:
issue = fetch_issue(args.repo, number)
comments_since = None if args.fetch_all_comments else since
comments_payload = fetch_comments(
args.repo,
number,
since=comments_since,
max_pages=max_comment_pages,
)
comments = comments_payload["items"]
issue_reaction_events = fetch_reactions_for_item(
f"repos/{args.repo}/issues/{number}/reactions", issue
)
comment_reactions_by_id = fetch_comment_reactions(args.repo, comments)
comments_hydration = {
"fetched": len(comments),
"total": int(issue.get("comments") or len(comments) or 0),
"since": format_timestamp(comments_since) if comments_since else None,
"truncated": comments_payload["truncated"],
"max_pages": comments_payload["max_pages"],
"fetch_all_comments": args.fetch_all_comments,
}
summary = summarize_issue(
issue,
comments,
requested_labels,
since,
until,
args.body_chars,
args.comment_chars,
issue_reaction_events=issue_reaction_events,
comment_reactions_by_id=comment_reactions_by_id,
all_labels=all_labels,
comments_hydration=comments_hydration,
attention_thresholds=attention_thresholds,
)
if summary is not None:
issues.append(summary)
issues.sort(
key=lambda issue: (issue["updated_at"], int(issue["number"] or 0)), reverse=True
)
totals = {
"candidate_issues": len(numbers),
"included_issues": len(issues),
"new_issues": sum(1 for issue in issues if issue["activity"]["new_issue"]),
"issues_with_new_comments": sum(
1 for issue in issues if issue["activity"]["new_comments"] > 0
),
"new_comments": sum(issue["activity"]["new_comments"] for issue in issues),
"comments_fetched": sum(
issue["comments_hydration"]["fetched"] for issue in issues
),
"issues_with_truncated_comment_hydration": sum(
1 for issue in issues if issue["comments_hydration"]["truncated"]
),
"updated_without_visible_new_post": sum(
1
for issue in issues
if issue["activity"]["updated_without_visible_new_post"]
),
"issue_reactions_current_total": sum(
issue["issue_reaction_total"] for issue in issues
),
"comment_reactions_current_total": sum(
issue["comment_reaction_total"] for issue in issues
),
"new_reactions": sum(issue["new_reactions"] for issue in issues),
"new_upvotes": sum(issue["new_upvotes"] for issue in issues),
"user_interactions": sum(issue["user_interactions"] for issue in issues),
}
ranked = ranked_digest_issues(issues)
ref_map = {issue["number"]: ref for ref, issue in enumerate(ranked, start=1)}
filter_label = "all" if all_labels else requested_labels
return {
"generated_at": format_timestamp(datetime.now(timezone.utc)),
"source": {
"repo": args.repo,
"skill": "codex-issue-digest",
"collector": skill_relative_path(),
"script_version": SCRIPT_VERSION,
"git_head": git_head(),
"gh_version": gh_version_output.splitlines()[0]
if gh_version_output
else None,
},
"window": {
"since": format_timestamp(since),
"until": format_timestamp(until),
"hours": round(window_hours, 3),
},
"attention_thresholds": attention_thresholds,
"filters": {
"owner_labels": filter_label,
"all_labels": all_labels,
"kind_labels": list(QUALIFYING_KIND_LABELS),
},
"collection_notes": [
"Issues are selected when they currently have bug or enhancement plus at least one requested owner label and were updated during the window.",
"By default, issue comments are fetched with since=window_start and a max page cap to avoid long historical threads; use --fetch-all-comments when exhaustive comment history is needed.",
"New issue comments are filtered by comment creation time within the window from the fetched comment set.",
"Reaction events are counted by GitHub reaction created_at timestamps for hydrated issues and fetched comments.",
"Current reaction totals are standing engagement signals; new_reactions and new_upvotes are windowed activity.",
"The collector does not assign semantic clusters; use summary_inputs as model-ready evidence for report-time clustering.",
"Pure reaction-only issues may be missed if GitHub issue search does not surface them via updated_at.",
"Issues updated during the window without a new issue body or new comment are retained because label/status edits can still be useful owner signals.",
],
"totals": totals,
"by_owner_label": count_by_label(
issues,
sorted(
{area for issue in issues for area in issue["owner_labels"]},
key=str.casefold,
)
if all_labels
else requested_labels,
),
"by_kind_label": count_by_kind(issues),
"hot_items": hot_items(issues),
"summary_inputs": summary_inputs(issues, ref_map=ref_map),
"digest_rows": digest_rows(issues, ref_map=ref_map),
"issues": issues,
}
def main():
args = parse_args()
try:
digest = collect_digest(args)
except (GhCommandError, RuntimeError, ValueError) as err:
sys.stderr.write(f"collect_issue_digest.py error: {err}\n")
return 1
sys.stdout.write(json.dumps(digest, indent=2, sort_keys=True) + "\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,685 @@
import importlib.util
from datetime import timezone
from pathlib import Path
MODULE_PATH = Path(__file__).with_name("collect_issue_digest.py")
MODULE_SPEC = importlib.util.spec_from_file_location(
"collect_issue_digest", MODULE_PATH
)
collect_issue_digest = importlib.util.module_from_spec(MODULE_SPEC)
assert MODULE_SPEC.loader is not None
MODULE_SPEC.loader.exec_module(collect_issue_digest)
def test_build_search_queries_uses_each_owner_and_kind_label():
since = collect_issue_digest.parse_timestamp("2026-04-25T12:34:56Z", "--since")
queries = collect_issue_digest.build_search_queries(
"openai/codex", ["tui", "exec"], since
)
assert queries == [
"repo:openai/codex is:issue updated:>=2026-04-25 label:tui label:bug",
"repo:openai/codex is:issue updated:>=2026-04-25 label:tui label:enhancement",
"repo:openai/codex is:issue updated:>=2026-04-25 label:exec label:bug",
"repo:openai/codex is:issue updated:>=2026-04-25 label:exec label:enhancement",
]
def test_build_search_queries_can_scan_all_labels():
since = collect_issue_digest.parse_timestamp("2026-04-25T12:34:56Z", "--since")
queries = collect_issue_digest.build_search_queries(
"openai/codex", [], since, all_labels=True
)
assert queries == [
"repo:openai/codex is:issue updated:>=2026-04-25 label:bug",
"repo:openai/codex is:issue updated:>=2026-04-25 label:enhancement",
]
def test_normalize_requested_labels_accepts_all_area_phrases():
assert collect_issue_digest.normalize_requested_labels(["all", "areas"]) == (
[],
True,
)
assert collect_issue_digest.normalize_requested_labels(["all-labels"]) == (
[],
True,
)
def test_search_issue_numbers_requests_updated_sort(monkeypatch):
calls = []
def fake_gh_json(args):
calls.append(args)
return {
"items": [
{"number": 1, "updated_at": "2026-04-25T00:00:00Z"},
]
}
monkeypatch.setattr(collect_issue_digest, "gh_json", fake_gh_json)
assert collect_issue_digest.search_issue_numbers(["query"], limit=10) == [1]
assert "-f" in calls[0]
assert "sort=updated" in calls[0]
assert "order=desc" in calls[0]
def test_search_issue_numbers_applies_limit_per_query(monkeypatch):
calls = []
def fake_gh_json(args):
calls.append(args)
query = next(
value.removeprefix("q=") for value in args if value.startswith("q=")
)
page = int(
next(
value.removeprefix("page=")
for value in args
if value.startswith("page=")
)
)
base = 10_000 if query == "first" else 20_000
offset = (page - 1) * 100
return {
"items": [
{
"number": base + offset + idx,
"updated_at": f"2026-04-25T00:{idx:02d}:00Z",
}
for idx in range(100)
]
}
monkeypatch.setattr(collect_issue_digest, "gh_json", fake_gh_json)
collect_issue_digest.search_issue_numbers(["first", "second"], limit=150)
queried_pages = [
(
next(
value.removeprefix("q=") for value in args if value.startswith("q=")
),
next(
value.removeprefix("page=")
for value in args
if value.startswith("page=")
),
)
for args in calls
]
assert queried_pages == [
("first", "1"),
("first", "2"),
("second", "1"),
("second", "2"),
]
def test_summarize_issue_keeps_new_comments_and_reaction_signals():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
issue = {
"number": 123,
"title": "TUI does not redraw",
"html_url": "https://github.com/openai/codex/issues/123",
"state": "open",
"created_at": "2026-04-24T20:00:00Z",
"updated_at": "2026-04-25T10:00:00Z",
"user": {"login": "alice"},
"author_association": "NONE",
"comments": 2,
"body": "The terminal freezes after resize.",
"labels": [{"name": "bug"}, {"name": "tui"}],
"reactions": {"total_count": 3, "+1": 2, "rocket": 1},
}
comments = [
{
"id": 1,
"created_at": "2026-04-25T11:00:00Z",
"updated_at": "2026-04-25T11:00:00Z",
"html_url": "https://github.com/openai/codex/issues/123#issuecomment-1",
"user": {"login": "bob"},
"author_association": "MEMBER",
"body": "I can reproduce this on main.",
"reactions": {"total_count": 4, "heart": 1, "+1": 3},
},
{
"id": 2,
"created_at": "2026-04-24T11:00:00Z",
"updated_at": "2026-04-24T11:00:00Z",
"html_url": "https://github.com/openai/codex/issues/123#issuecomment-2",
"user": {"login": "carol"},
"author_association": "NONE",
"body": "Older comment.",
"reactions": {"total_count": 1, "eyes": 1},
},
]
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["tui", "exec"],
since,
until,
body_chars=200,
comment_chars=200,
)
assert summary == {
"number": 123,
"title": "TUI does not redraw",
"description": "TUI does not redraw",
"url": "https://github.com/openai/codex/issues/123",
"state": "open",
"author": "alice",
"author_association": "NONE",
"created_at": "2026-04-24T20:00:00Z",
"updated_at": "2026-04-25T10:00:00Z",
"labels": ["bug", "tui"],
"kind_labels": ["bug"],
"owner_labels": ["tui"],
"comments_total": 2,
"comments_hydration": {
"fetched": 2,
"since": None,
"truncated": False,
"max_pages": None,
},
"issue_reactions": {"+1": 2, "rocket": 1},
"issue_reaction_total": 3,
"comment_reaction_total": 5,
"new_comment_reaction_total": 4,
"new_issue_reactions": 0,
"new_issue_upvotes": 0,
"new_comment_reactions": 0,
"new_comment_upvotes": 0,
"new_reactions": 0,
"new_upvotes": 0,
"user_interactions": 1,
"attention": False,
"attention_level": 0,
"attention_marker": "",
"engagement_score": 12,
"activity": {
"new_issue": False,
"new_comments": 1,
"new_human_comments": 1,
"new_reactions": 0,
"new_upvotes": 0,
"updated_without_visible_new_post": False,
},
"body_excerpt": "The terminal freezes after resize.",
"new_comments": [
{
"id": 1,
"author": "bob",
"author_association": "MEMBER",
"created_at": "2026-04-25T11:00:00Z",
"updated_at": "2026-04-25T11:00:00Z",
"url": "https://github.com/openai/codex/issues/123#issuecomment-1",
"human_user_interaction": True,
"reactions": {"+1": 3, "heart": 1},
"reaction_total": 4,
"new_reactions": 0,
"new_upvotes": 0,
"new_reaction_counts": {},
"body_excerpt": "I can reproduce this on main.",
}
],
}
def test_summarize_issue_filters_non_owner_or_non_kind_labels():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
base_issue = {
"number": 1,
"title": "Question",
"created_at": "2026-04-25T01:00:00Z",
"updated_at": "2026-04-25T01:00:00Z",
"labels": [{"name": "question"}, {"name": "tui"}],
}
assert (
collect_issue_digest.summarize_issue(
base_issue,
[],
["tui"],
since,
until,
body_chars=100,
comment_chars=100,
)
is None
)
issue_without_owner = dict(base_issue)
issue_without_owner["labels"] = [{"name": "bug"}, {"name": "app"}]
assert (
collect_issue_digest.summarize_issue(
issue_without_owner,
[],
["tui"],
since,
until,
body_chars=100,
comment_chars=100,
)
is None
)
def test_resolve_window_defaults_to_previous_hours():
class Args:
since = None
until = "2026-04-26T12:00:00Z"
window_hours = 24
since, until = collect_issue_digest.resolve_window(Args())
assert since.isoformat() == "2026-04-25T12:00:00+00:00"
assert until.tzinfo == timezone.utc
def test_parse_duration_hours_accepts_common_phrases():
assert collect_issue_digest.parse_duration_hours("past week") == 168
assert collect_issue_digest.parse_duration_hours("48h") == 48
assert collect_issue_digest.parse_duration_hours("2 days") == 48
assert collect_issue_digest.parse_duration_hours("1w") == 168
def test_attention_thresholds_scale_by_window_length():
one_day = collect_issue_digest.attention_thresholds_for_window(24)
assert one_day["elevated"] == 5
assert one_day["very_high"] == 10
half_day = collect_issue_digest.attention_thresholds_for_window(12)
assert half_day["elevated"] == 3
assert half_day["very_high"] == 5
week = collect_issue_digest.attention_thresholds_for_window(168)
assert week["elevated"] == 35
assert week["very_high"] == 70
assert collect_issue_digest.attention_marker_for(34, week) == ""
assert collect_issue_digest.attention_marker_for(35, week) == "🔥"
assert collect_issue_digest.attention_marker_for(70, week) == "🔥🔥"
def test_fetch_comments_uses_since_filter_and_page_cap(monkeypatch):
calls = []
def fake_gh_json(args):
calls.append(args)
return [{"id": idx} for idx in range(100)]
monkeypatch.setattr(collect_issue_digest, "gh_json", fake_gh_json)
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
payload = collect_issue_digest.fetch_comments(
"openai/codex", 123, since=since, max_pages=1
)
assert len(payload["items"]) == 100
assert payload["truncated"] is True
assert payload["max_pages"] == 1
assert calls == [
[
"api",
"repos/openai/codex/issues/123/comments?since=2026-04-25T00%3A00%3A00Z&per_page=100&page=1",
]
]
def test_issue_description_prefers_title_over_body_noise():
issue = {
"title": "Codex.app GUI: MCP child processes not reaped after task completion",
"body": "A later crash mention should not override the title-level symptom.",
"labels": [{"name": "app"}, {"name": "bug"}],
}
description = collect_issue_digest.issue_description(issue)
assert "MCP child processes" in description
assert "crash" not in description.casefold()
def test_attention_markers_count_human_user_interactions():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
issue = {
"number": 456,
"title": "Agent context is exploding",
"html_url": "https://github.com/openai/codex/issues/456",
"state": "open",
"created_at": "2026-04-25T01:00:00Z",
"updated_at": "2026-04-25T12:00:00Z",
"user": {"login": "alice"},
"labels": [{"name": "bug"}, {"name": "agent"}],
}
comments = [
{
"id": idx,
"created_at": "2026-04-25T02:00:00Z",
"updated_at": "2026-04-25T02:00:00Z",
"user": {"login": f"user-{idx}"},
"body": "same here",
}
for idx in range(4)
]
comments.append(
{
"id": 99,
"created_at": "2026-04-25T02:00:00Z",
"updated_at": "2026-04-25T02:00:00Z",
"user": {"login": "github-actions[bot]"},
"body": "duplicate bot note",
}
)
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["agent"],
since,
until,
body_chars=100,
comment_chars=100,
)
assert summary["user_interactions"] == 5
assert summary["activity"]["new_human_comments"] == 4
assert summary["attention"] is True
assert summary["attention_level"] == 1
assert summary["attention_marker"] == "🔥"
issue["created_at"] = "2026-04-24T01:00:00Z"
comments.extend(
{
"id": idx,
"created_at": "2026-04-25T03:00:00Z",
"updated_at": "2026-04-25T03:00:00Z",
"user": {"login": f"extra-user-{idx}"},
"body": "also seeing this",
}
for idx in range(100, 106)
)
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["agent"],
since,
until,
body_chars=100,
comment_chars=100,
)
assert summary["user_interactions"] == 10
assert summary["attention_level"] == 2
assert summary["attention_marker"] == "🔥🔥"
def test_reactions_count_toward_attention_markers():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
issue = {
"number": 789,
"title": "Support 1M token context",
"html_url": "https://github.com/openai/codex/issues/789",
"state": "open",
"created_at": "2026-04-24T01:00:00Z",
"updated_at": "2026-04-25T12:00:00Z",
"user": {"login": "alice"},
"labels": [{"name": "enhancement"}, {"name": "context"}],
"reactions": {"total_count": 20, "+1": 20},
}
comments = [
{
"id": 1,
"created_at": "2026-04-25T02:00:00Z",
"updated_at": "2026-04-25T02:00:00Z",
"user": {"login": "commenter"},
"body": "please",
"reactions": {"total_count": 2, "+1": 2},
}
]
issue_reactions = [
{
"content": "+1",
"created_at": "2026-04-25T03:00:00Z",
"user": {"login": f"reactor-{idx}"},
}
for idx in range(18)
]
comment_reactions_by_id = {
1: [
{
"content": "heart",
"created_at": "2026-04-25T04:00:00Z",
"user": {"login": "human-reactor"},
},
{
"content": "+1",
"created_at": "2026-04-25T04:00:00Z",
"user": {"login": "github-actions[bot]"},
},
]
}
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["context"],
since,
until,
body_chars=100,
comment_chars=100,
issue_reaction_events=issue_reactions,
comment_reactions_by_id=comment_reactions_by_id,
)
assert summary["new_reactions"] == 19
assert summary["new_upvotes"] == 18
assert summary["user_interactions"] == 20
assert summary["attention_level"] == 2
assert summary["attention_marker"] == "🔥🔥"
assert summary["new_comments"][0]["new_reactions"] == 1
assert summary["new_comments"][0]["new_upvotes"] == 0
def test_digest_rows_are_table_ready_with_concise_descriptions():
rows = collect_issue_digest.digest_rows(
[
{
"number": 1,
"title": "Quiet bug",
"description": "Quiet bug",
"url": "https://github.com/openai/codex/issues/1",
"owner_labels": ["context"],
"kind_labels": ["bug"],
"state": "open",
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 1,
"new_reactions": 0,
"new_upvotes": 0,
"engagement_score": 3,
"issue_reaction_total": 0,
"comment_reaction_total": 0,
"updated_at": "2026-04-25T01:00:00Z",
"activity": {
"new_issue": True,
"new_comments": 0,
"new_reactions": 0,
"updated_without_visible_new_post": False,
},
},
{
"number": 2,
"title": "Busy bug",
"description": "High-volume bug report",
"url": "https://github.com/openai/codex/issues/2",
"owner_labels": ["agent"],
"kind_labels": ["bug"],
"state": "open",
"attention": True,
"attention_level": 1,
"attention_marker": "🔥",
"user_interactions": 17,
"new_reactions": 3,
"new_upvotes": 2,
"engagement_score": 20,
"issue_reaction_total": 5,
"comment_reaction_total": 2,
"updated_at": "2026-04-25T02:00:00Z",
"activity": {
"new_issue": False,
"new_comments": 16,
"new_reactions": 3,
"updated_without_visible_new_post": False,
},
},
]
)
assert rows[0] == {
"ref": 1,
"ref_markdown": "[1](https://github.com/openai/codex/issues/2)",
"marker": "🔥",
"attention_marker": "🔥",
"number": 2,
"description": "High-volume bug report",
"title": "Busy bug",
"url": "https://github.com/openai/codex/issues/2",
"area": "agent",
"kind": "bug",
"state": "open",
"interactions": 17,
"user_interactions": 17,
"new_reactions": 3,
"new_upvotes": 2,
"current_reactions": 7,
}
def test_summary_inputs_are_model_ready_without_preclustering():
issues = [
{
"number": 20,
"title": "Windows app Browser Use external navigation fails",
"description": "Browser Use navigation or app-server failure",
"url": "https://github.com/openai/codex/issues/20",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 3,
"new_reactions": 1,
"engagement_score": 8,
"updated_at": "2026-04-25T04:00:00Z",
"activity": {"new_comments": 2},
},
{
"number": 21,
"title": "On Windows, cmake output waits until timeout",
"description": "Windows command timeout/capture problem",
"url": "https://github.com/openai/codex/issues/21",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 3,
"new_reactions": 0,
"engagement_score": 7,
"updated_at": "2026-04-25T03:00:00Z",
"activity": {"new_comments": 3},
},
{
"number": 22,
"title": "Windows computer use tool fails to click buttons",
"description": "Computer-use workflow failure",
"url": "https://github.com/openai/codex/issues/22",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 3,
"new_reactions": 0,
"engagement_score": 6,
"updated_at": "2026-04-25T02:00:00Z",
"activity": {"new_comments": 3},
},
]
rows = collect_issue_digest.summary_inputs(issues, ref_map={20: 1, 21: 2, 22: 3})
assert rows == [
{
"ref": 1,
"ref_markdown": "[1](https://github.com/openai/codex/issues/20)",
"number": 20,
"title": "Windows app Browser Use external navigation fails",
"description": "Browser Use navigation or app-server failure",
"url": "https://github.com/openai/codex/issues/20",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"state": "",
"attention_marker": "",
"interactions": 3,
"new_comments": 2,
"new_reactions": 1,
"new_upvotes": 0,
"current_reactions": 0,
},
{
"ref": 2,
"ref_markdown": "[2](https://github.com/openai/codex/issues/21)",
"number": 21,
"title": "On Windows, cmake output waits until timeout",
"description": "Windows command timeout/capture problem",
"url": "https://github.com/openai/codex/issues/21",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"state": "",
"attention_marker": "",
"interactions": 3,
"new_comments": 3,
"new_reactions": 0,
"new_upvotes": 0,
"current_reactions": 0,
},
{
"ref": 3,
"ref_markdown": "[3](https://github.com/openai/codex/issues/22)",
"number": 22,
"title": "Windows computer use tool fails to click buttons",
"description": "Computer-use workflow failure",
"url": "https://github.com/openai/codex/issues/22",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"state": "",
"attention_marker": "",
"interactions": 3,
"new_comments": 3,
"new_reactions": 0,
"new_upvotes": 0,
"current_reactions": 0,
},
]

View File

@@ -4,9 +4,11 @@ ARG TZ
ARG DEBIAN_FRONTEND=noninteractive
ARG NODE_MAJOR=22
ARG RUST_TOOLCHAIN=1.92.0
ARG CODEX_NPM_VERSION=latest
# Keep this in sync with .devcontainer/codex-install/package.json and pnpm-lock.yaml.
ARG CODEX_NPM_VERSION=0.121.0
ENV TZ="$TZ"
ENV COREPACK_ENABLE_DOWNLOAD_PROMPT=0
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
@@ -43,12 +45,18 @@ RUN apt-get update \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
COPY .devcontainer/codex-install/package.json \
.devcontainer/codex-install/pnpm-lock.yaml \
.devcontainer/codex-install/pnpm-workspace.yaml \
/opt/codex-install/
RUN curl -fsSL "https://deb.nodesource.com/setup_${NODE_MAJOR}.x" | bash - \
&& apt-get update \
&& apt-get install -y --no-install-recommends nodejs \
&& npm install -g corepack@latest "@openai/codex@${CODEX_NPM_VERSION}" \
&& corepack enable \
&& corepack prepare pnpm@10.28.2 --activate \
&& test "$(node -p "require('/opt/codex-install/package.json').dependencies['@openai/codex']")" = "${CODEX_NPM_VERSION}" \
&& cd /opt/codex-install \
&& corepack pnpm install --prod --frozen-lockfile \
&& ln -s /opt/codex-install/node_modules/.bin/codex /usr/local/bin/codex \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -0,0 +1,13 @@
{
"name": "codex-devcontainer-install",
"private": true,
"description": "Locked Codex CLI install boundary for the secure devcontainer.",
"dependencies": {
"@openai/codex": "0.121.0"
},
"engines": {
"node": ">=22",
"pnpm": ">=10.33.0"
},
"packageManager": "pnpm@10.33.0+sha512.10568bb4a6afb58c9eb3630da90cc9516417abebd3fabbe6739f0ae795728da1491e9db5a544c76ad8eb7570f5c4bb3d6c637b2cb41bfdcdb47fa823c8649319"
}

View File

@@ -0,0 +1,85 @@
lockfileVersion: '9.0'
settings:
autoInstallPeers: true
excludeLinksFromLockfile: false
importers:
.:
dependencies:
'@openai/codex':
specifier: 0.121.0
version: 0.121.0
packages:
'@openai/codex@0.121.0':
resolution: {integrity: sha512-kCJ2NeATd4QBQRmqV04ymdN1ZU3MSwnJQDm/KzjpuzGvCuUVEn7no/T2mRyxQ2x77AACqriNOyPPoM/yufyvNg==}
engines: {node: '>=16'}
hasBin: true
'@openai/codex@0.121.0-darwin-arm64':
resolution: {integrity: sha512-ZyBqIB6Fb4I0hGb/h65Vu7ePYjHSmGiqqfm+/1djEuxDPkqjfi4wkxYxNYNY+6najyNGN4UijOSTTf19eDCrqw==}
engines: {node: '>=16'}
cpu: [arm64]
os: [darwin]
'@openai/codex@0.121.0-darwin-x64':
resolution: {integrity: sha512-1/OAtdkAZ5yPI3xqaEFlHuPziS1yCqL2gOZdswE7HTmmwpIxi6Z3FCo60JWDPluIp89z4tftdjq73/OCN0YVcw==}
engines: {node: '>=16'}
cpu: [x64]
os: [darwin]
'@openai/codex@0.121.0-linux-arm64':
resolution: {integrity: sha512-2UgMmdo237o7SCMsfb529cOSEM2HFUgN6OBkv5SBLwfNY1NO2Ex6JnUjlppEXlX6/4cXfZ5qjDghVz5j/+B9zw==}
engines: {node: '>=16'}
cpu: [arm64]
os: [linux]
'@openai/codex@0.121.0-linux-x64':
resolution: {integrity: sha512-vlpNJXIqss800J+32Vy7TUZzv31n61b45OLxmsVQGFkTNLJcjFrj9jDUC7I62eC4F16gLioilefNfv4CdJQOEw==}
engines: {node: '>=16'}
cpu: [x64]
os: [linux]
'@openai/codex@0.121.0-win32-arm64':
resolution: {integrity: sha512-m88q4f3XI5npn1t6OG0nWGHWWAjO5FgjRwxh4hdujbLO6t9CiCNfhfPZIOSsoATbrCNwLC+6S77m3cjbNToPNg==}
engines: {node: '>=16'}
cpu: [arm64]
os: [win32]
'@openai/codex@0.121.0-win32-x64':
resolution: {integrity: sha512-Fp0ecVOyM+VcBi/y4HVvRzhifO9YqRiHzhV3rhtAppC7flh22WPguLC4kmvXYAR0p3RPzbo35M2CedWnkOT+cw==}
engines: {node: '>=16'}
cpu: [x64]
os: [win32]
snapshots:
'@openai/codex@0.121.0':
optionalDependencies:
'@openai/codex-darwin-arm64': '@openai/codex@0.121.0-darwin-arm64'
'@openai/codex-darwin-x64': '@openai/codex@0.121.0-darwin-x64'
'@openai/codex-linux-arm64': '@openai/codex@0.121.0-linux-arm64'
'@openai/codex-linux-x64': '@openai/codex@0.121.0-linux-x64'
'@openai/codex-win32-arm64': '@openai/codex@0.121.0-win32-arm64'
'@openai/codex-win32-x64': '@openai/codex@0.121.0-win32-x64'
'@openai/codex@0.121.0-darwin-arm64':
optional: true
'@openai/codex@0.121.0-darwin-x64':
optional: true
'@openai/codex@0.121.0-linux-arm64':
optional: true
'@openai/codex@0.121.0-linux-x64':
optional: true
'@openai/codex@0.121.0-win32-arm64':
optional: true
'@openai/codex@0.121.0-win32-x64':
optional: true

View File

@@ -0,0 +1,12 @@
packages:
- "."
minimumReleaseAge: 10080
minimumReleaseAgeExclude: []
blockExoticSubdeps: true
strictDepBuilds: true
trustPolicy: no-downgrade
trustPolicyIgnoreAfter: 10080
trustPolicyExclude: []
allowBuilds: {}

View File

@@ -8,7 +8,7 @@
"TZ": "${localEnv:TZ:UTC}",
"NODE_MAJOR": "22",
"RUST_TOOLCHAIN": "1.92.0",
"CODEX_NPM_VERSION": "latest"
"CODEX_NPM_VERSION": "0.121.0"
}
},
"runArgs": [

1
.gitattributes vendored
View File

@@ -1 +1,2 @@
codex-rs/app-server-protocol/schema/** linguist-generated
codex-rs/hooks/schema/generated/** linguist-generated

View File

@@ -7,6 +7,9 @@ inputs:
artifacts-dir:
description: Absolute path to the directory containing built binaries to sign.
required: true
binaries:
description: Space-delimited binary basenames to sign.
default: "codex codex-responses-api-proxy"
runs:
using: composite
@@ -18,6 +21,7 @@ runs:
shell: bash
env:
ARTIFACTS_DIR: ${{ inputs.artifacts-dir }}
BINARIES: ${{ inputs.binaries }}
COSIGN_EXPERIMENTAL: "1"
COSIGN_YES: "true"
COSIGN_OIDC_CLIENT_ID: "sigstore"
@@ -31,7 +35,7 @@ runs:
exit 1
fi
for binary in codex codex-responses-api-proxy; do
for binary in ${BINARIES}; do
artifact="${dest}/${binary}"
if [[ ! -f "$artifact" ]]; then
echo "Binary $artifact not found"

View File

@@ -4,6 +4,9 @@ inputs:
target:
description: Rust compilation target triple (e.g. aarch64-apple-darwin).
required: true
binaries:
description: Space-delimited binary basenames to sign and notarize.
default: "codex codex-responses-api-proxy"
sign-binaries:
description: Whether to sign and notarize the macOS binaries.
required: false
@@ -119,6 +122,7 @@ runs:
shell: bash
env:
TARGET: ${{ inputs.target }}
BINARIES: ${{ inputs.binaries }}
run: |
set -euo pipefail
@@ -134,7 +138,7 @@ runs:
entitlements_path="$GITHUB_ACTION_PATH/codex.entitlements.plist"
for binary in codex codex-responses-api-proxy; do
for binary in ${BINARIES}; do
path="codex-rs/target/${TARGET}/release/${binary}"
codesign --force --options runtime --timestamp --entitlements "$entitlements_path" --sign "$APPLE_CODESIGN_IDENTITY" "${keychain_args[@]}" "$path"
done
@@ -144,6 +148,7 @@ runs:
shell: bash
env:
TARGET: ${{ inputs.target }}
BINARIES: ${{ inputs.binaries }}
APPLE_NOTARIZATION_KEY_P8: ${{ inputs.apple-notarization-key-p8 }}
APPLE_NOTARIZATION_KEY_ID: ${{ inputs.apple-notarization-key-id }}
APPLE_NOTARIZATION_ISSUER_ID: ${{ inputs.apple-notarization-issuer-id }}
@@ -182,8 +187,9 @@ runs:
notarize_submission "$binary" "$archive_path" "$notary_key_path"
}
notarize_binary "codex"
notarize_binary "codex-responses-api-proxy"
for binary in ${BINARIES}; do
notarize_binary "${binary}"
done
- name: Sign and notarize macOS dmg
if: ${{ inputs.sign-dmg == 'true' }}

View File

@@ -8,7 +8,7 @@ inputs:
description: Logical namespace used to keep concurrent Bazel jobs from reserving the same repository cache key.
required: true
install-test-prereqs:
description: Install Node.js and DotSlash for Bazel-backed test jobs.
description: Install DotSlash for Bazel-backed test jobs.
required: false
default: "false"
outputs:

View File

@@ -0,0 +1,54 @@
name: Run argument comment lint
description: Run argument-comment-lint on codex-rs via Bazel.
inputs:
target:
description: Runner target passed to setup-bazel-ci.
required: true
buildbuddy-api-key:
description: BuildBuddy API key used by Bazel CI.
required: false
default: ""
runs:
using: composite
steps:
- uses: ./.github/actions/setup-bazel-ci
with:
target: ${{ inputs.target }}
install-test-prereqs: true
- name: Install Linux sandbox build dependencies
if: ${{ runner.os == 'Linux' }}
shell: bash
run: |
sudo DEBIAN_FRONTEND=noninteractive apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends pkg-config libcap-dev
- name: Run argument comment lint on codex-rs via Bazel
if: ${{ runner.os != 'Windows' }}
env:
BUILDBUDDY_API_KEY: ${{ inputs.buildbuddy-api-key }}
shell: bash
run: |
bazel_targets="$(./tools/argument-comment-lint/list-bazel-targets.sh)"
./.github/scripts/run-bazel-ci.sh \
-- \
build \
--config=argument-comment-lint \
--keep_going \
--build_metadata=COMMIT_SHA=${GITHUB_SHA} \
-- \
${bazel_targets}
- name: Run argument comment lint on codex-rs via Bazel
if: ${{ runner.os == 'Windows' }}
env:
BUILDBUDDY_API_KEY: ${{ inputs.buildbuddy-api-key }}
shell: bash
run: |
./.github/scripts/run-argument-comment-lint-bazel.sh \
--config=argument-comment-lint \
--platforms=//:local_windows \
--keep_going \
--build_metadata=COMMIT_SHA=${GITHUB_SHA}

View File

@@ -5,7 +5,7 @@ inputs:
description: Target triple used for cache namespacing.
required: true
install-test-prereqs:
description: Install Node.js and DotSlash for Bazel-backed test jobs.
description: Install DotSlash for Bazel-backed test jobs.
required: false
default: "false"
outputs:
@@ -16,12 +16,6 @@ outputs:
runs:
using: composite
steps:
- name: Set up Node.js for js_repl tests
if: inputs.install-test-prereqs == 'true'
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6
with:
node-version-file: codex-rs/node-version.txt
# Some integration tests rely on DotSlash being installed.
# See https://github.com/openai/codex/pull/7617.
- name: Install DotSlash
@@ -39,7 +33,7 @@ runs:
run: Copy-Item (Get-Command dotslash).Source -Destination "$env:LOCALAPPDATA\Microsoft\WindowsApps\dotslash.exe"
- name: Set up Bazel
uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3
uses: bazel-contrib/setup-bazel@c5acdfb288317d0b5c0bbd7a396a3dc868bb0f86 # 0.19.0
- name: Configure Bazel repository cache
id: configure_bazel_repository_cache
@@ -122,6 +116,11 @@ runs:
}
}
- name: Compute cache-stable Windows Bazel PATH
if: runner.os == 'Windows'
shell: pwsh
run: ./.github/scripts/compute-bazel-windows-path.ps1
- name: Enable Git long paths (Windows)
if: runner.os == 'Windows'
shell: pwsh

View File

@@ -4,6 +4,9 @@ inputs:
target:
description: Target triple for the artifacts to sign.
required: true
binaries:
description: Space-delimited binary basenames to sign.
default: "codex codex-responses-api-proxy codex-windows-sandbox-setup codex-command-runner"
client-id:
description: Azure Trusted Signing client ID.
required: true
@@ -33,6 +36,23 @@ runs:
tenant-id: ${{ inputs.tenant-id }}
subscription-id: ${{ inputs.subscription-id }}
- name: Prepare file list
id: prepare
shell: bash
env:
TARGET: ${{ inputs.target }}
BINARIES: ${{ inputs.binaries }}
run: |
set -euo pipefail
{
echo "files<<EOF"
for binary in ${BINARIES}; do
echo "${GITHUB_WORKSPACE}/codex-rs/target/${TARGET}/release/${binary}.exe"
done
echo "EOF"
} >> "$GITHUB_OUTPUT"
- name: Sign Windows binaries with Azure Trusted Signing
uses: azure/trusted-signing-action@1d365fec12862c4aa68fcac418143d73f0cea293 # v0
with:
@@ -50,8 +70,4 @@ runs:
exclude-azure-developer-cli-credential: true
exclude-interactive-browser-credential: true
cache-dependencies: false
files: |
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex.exe
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex-responses-api-proxy.exe
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex-windows-sandbox-setup.exe
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex-command-runner.exe
files: ${{ steps.prepare.outputs.files }}

View File

@@ -28,6 +28,34 @@
}
}
},
"codex-app-server": {
"platforms": {
"macos-aarch64": {
"regex": "^codex-app-server-aarch64-apple-darwin\\.zst$",
"path": "codex-app-server"
},
"macos-x86_64": {
"regex": "^codex-app-server-x86_64-apple-darwin\\.zst$",
"path": "codex-app-server"
},
"linux-x86_64": {
"regex": "^codex-app-server-x86_64-unknown-linux-musl\\.zst$",
"path": "codex-app-server"
},
"linux-aarch64": {
"regex": "^codex-app-server-aarch64-unknown-linux-musl\\.zst$",
"path": "codex-app-server"
},
"windows-x86_64": {
"regex": "^codex-app-server-x86_64-pc-windows-msvc\\.exe\\.zst$",
"path": "codex-app-server.exe"
},
"windows-aarch64": {
"regex": "^codex-app-server-aarch64-pc-windows-msvc\\.exe\\.zst$",
"path": "codex-app-server.exe"
}
}
},
"codex-responses-api-proxy": {
"platforms": {
"macos-aarch64": {

View File

@@ -1,6 +1,6 @@
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed:
External code contributions are by invitation only. Please read the dedicated "Contributing" markdown file for details:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.

View File

@@ -0,0 +1,105 @@
<#
BuildBuddy cache keys include the action and test environment, so Bazel should
not inherit the full hosted-runner PATH on Windows. That PATH includes volatile
tool entries, such as Maven, that can change independently of this repo and
cause avoidable cache misses.
This script derives a smaller, cache-stable PATH that keeps the Windows
toolchain entries Bazel-backed CI tasks need: MSVC and Windows SDK paths, Git,
PowerShell, Node, Python, DotSlash, and the standard Windows system
directories.
`setup-bazel-ci` runs this after exporting the MSVC environment, and the script
publishes the result via `GITHUB_ENV` as `CODEX_BAZEL_WINDOWS_PATH` so later
steps can pass that explicit PATH to Bazel.
#>
$stablePathEntries = New-Object System.Collections.Generic.List[string]
$seenEntries = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::OrdinalIgnoreCase)
$windowsAppsPath = if ([string]::IsNullOrWhiteSpace($env:LOCALAPPDATA)) {
$null
} else {
"$($env:LOCALAPPDATA)\Microsoft\WindowsApps"
}
$windowsDir = if ($env:WINDIR) {
$env:WINDIR
} elseif ($env:SystemRoot) {
$env:SystemRoot
} else {
$null
}
function Add-StablePathEntry {
param([string]$PathEntry)
if ([string]::IsNullOrWhiteSpace($PathEntry)) {
return
}
if ($seenEntries.Add($PathEntry)) {
[void]$stablePathEntries.Add($PathEntry)
}
}
foreach ($pathEntry in ($env:PATH -split ';')) {
if ([string]::IsNullOrWhiteSpace($pathEntry)) {
continue
}
if (
$pathEntry -like '*Microsoft Visual Studio*' -or
$pathEntry -like '*Windows Kits*' -or
$pathEntry -like '*Microsoft SDKs*' -or
$pathEntry -like 'C:\Program Files\Git\*' -or
$pathEntry -like 'C:\Program Files\PowerShell\*' -or
$pathEntry -like 'C:\hostedtoolcache\windows\node\*' -or
$pathEntry -like 'C:\hostedtoolcache\windows\Python\*' -or
$pathEntry -eq 'D:\a\_temp\install-dotslash\bin' -or
($windowsDir -and ($pathEntry -eq $windowsDir -or $pathEntry -like "${windowsDir}\*"))
) {
Add-StablePathEntry $pathEntry
}
}
$gitCommand = Get-Command git -ErrorAction SilentlyContinue
if ($gitCommand) {
Add-StablePathEntry (Split-Path $gitCommand.Source -Parent)
}
$nodeCommand = Get-Command node -ErrorAction SilentlyContinue
if ($nodeCommand) {
Add-StablePathEntry (Split-Path $nodeCommand.Source -Parent)
}
$python3Command = Get-Command python3 -ErrorAction SilentlyContinue
if ($python3Command) {
Add-StablePathEntry (Split-Path $python3Command.Source -Parent)
}
$pythonCommand = Get-Command python -ErrorAction SilentlyContinue
if ($pythonCommand) {
Add-StablePathEntry (Split-Path $pythonCommand.Source -Parent)
}
$pwshCommand = Get-Command pwsh -ErrorAction SilentlyContinue
if ($pwshCommand) {
Add-StablePathEntry (Split-Path $pwshCommand.Source -Parent)
}
if ($windowsAppsPath) {
Add-StablePathEntry $windowsAppsPath
}
if ($stablePathEntries.Count -eq 0) {
throw 'Failed to derive cache-stable Windows PATH.'
}
if ([string]::IsNullOrWhiteSpace($env:GITHUB_ENV)) {
throw 'GITHUB_ENV must be set.'
}
$stablePath = $stablePathEntries -join ';'
Write-Host 'Derived CODEX_BAZEL_WINDOWS_PATH entries:'
foreach ($pathEntry in $stablePathEntries) {
Write-Host " $pathEntry"
}
"CODEX_BAZEL_WINDOWS_PATH=$stablePath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append

View File

@@ -2,16 +2,6 @@
set -euo pipefail
ci_config=ci-linux
case "${RUNNER_OS:-}" in
macOS)
ci_config=ci-macos
;;
Windows)
ci_config=ci-windows
;;
esac
bazel_lint_args=("$@")
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
has_host_platform_override=0
@@ -44,29 +34,6 @@ if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
bazel_lint_args+=("--skip_incompatible_explicit_targets")
fi
bazel_startup_args=()
if [[ -n "${BAZEL_OUTPUT_USER_ROOT:-}" ]]; then
bazel_startup_args+=("--output_user_root=${BAZEL_OUTPUT_USER_ROOT}")
fi
run_bazel() {
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
MSYS2_ARG_CONV_EXCL='*' bazel "$@"
return
fi
bazel "$@"
}
run_bazel_with_startup_args() {
if [[ ${#bazel_startup_args[@]} -gt 0 ]]; then
run_bazel "${bazel_startup_args[@]}" "$@"
return
fi
run_bazel "$@"
}
read_query_labels() {
local query="$1"
local query_stdout
@@ -74,12 +41,10 @@ read_query_labels() {
query_stdout="$(mktemp)"
query_stderr="$(mktemp)"
if ! run_bazel_with_startup_args \
--noexperimental_remote_repo_contents_cache \
query \
if ! ./.github/scripts/run-bazel-query-ci.sh \
--keep_going \
--output=label \
"$query" >"$query_stdout" 2>"$query_stderr"; then
-- "$query" >"$query_stdout" 2>"$query_stderr"; then
cat "$query_stderr" >&2
rm -f "$query_stdout" "$query_stderr"
exit 1

View File

@@ -4,7 +4,6 @@ set -euo pipefail
print_failed_bazel_test_logs=0
print_failed_bazel_action_summary=0
use_node_test_env=0
remote_download_toplevel=0
windows_msvc_host_platform=0
@@ -18,10 +17,6 @@ while [[ $# -gt 0 ]]; do
print_failed_bazel_action_summary=1
shift
;;
--use-node-test-env)
use_node_test_env=1
shift
;;
--remote-download-toplevel)
remote_download_toplevel=1
shift
@@ -42,7 +37,7 @@ while [[ $# -gt 0 ]]; do
done
if [[ $# -eq 0 ]]; then
echo "Usage: $0 [--print-failed-test-logs] [--print-failed-action-summary] [--use-node-test-env] [--remote-download-toplevel] [--windows-msvc-host-platform] -- <bazel args> -- <targets>" >&2
echo "Usage: $0 [--print-failed-test-logs] [--print-failed-action-summary] [--remote-download-toplevel] [--windows-msvc-host-platform] -- <bazel args> -- <targets>" >&2
exit 1
fi
@@ -249,16 +244,6 @@ if [[ ${#bazel_args[@]} -eq 0 || ${#bazel_targets[@]} -eq 0 ]]; then
exit 1
fi
if [[ $use_node_test_env -eq 1 ]]; then
# Bazel test sandboxes on macOS may resolve an older Homebrew `node`
# before the `actions/setup-node` runtime on PATH.
node_bin="$(which node)"
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
node_bin="$(cygpath -w "${node_bin}")"
fi
bazel_args+=("--test_env=CODEX_JS_REPL_NODE_PATH=${node_bin}")
fi
post_config_bazel_args=()
if [[ "${RUNNER_OS:-}" == "Windows" && $windows_msvc_host_platform -eq 1 ]]; then
has_host_platform_override=0
@@ -306,7 +291,6 @@ if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
INCLUDE
LIB
LIBPATH
PATH
UCRTVersion
UniversalCRTSdkDir
VCINSTALLDIR
@@ -323,6 +307,17 @@ if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
post_config_bazel_args+=("--action_env=${env_var}" "--host_action_env=${env_var}")
fi
done
if [[ -z "${CODEX_BAZEL_WINDOWS_PATH:-}" ]]; then
echo "CODEX_BAZEL_WINDOWS_PATH must be set for Windows Bazel CI." >&2
exit 1
fi
post_config_bazel_args+=(
"--action_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}"
"--host_action_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}"
"--test_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}"
)
fi
bazel_console_log="$(mktemp)"

75
.github/scripts/run-bazel-query-ci.sh vendored Executable file
View File

@@ -0,0 +1,75 @@
#!/usr/bin/env bash
set -euo pipefail
# Run Bazel queries with the same CI startup settings as the main build/test
# invocation so target-discovery queries can reuse the same Bazel server.
query_args=()
while [[ $# -gt 0 ]]; do
case "$1" in
--)
shift
break
;;
*)
query_args+=("$1")
shift
;;
esac
done
if [[ $# -ne 1 ]]; then
echo "Usage: $0 [<bazel query args>...] -- <query expression>" >&2
exit 1
fi
query_expression="$1"
ci_config=ci-linux
case "${RUNNER_OS:-}" in
macOS)
ci_config=ci-macos
;;
Windows)
ci_config=ci-windows
;;
esac
bazel_startup_args=()
if [[ -n "${BAZEL_OUTPUT_USER_ROOT:-}" ]]; then
bazel_startup_args+=("--output_user_root=${BAZEL_OUTPUT_USER_ROOT}")
fi
run_bazel() {
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
MSYS2_ARG_CONV_EXCL='*' bazel "$@"
return
fi
bazel "$@"
}
bazel_query_args=(--noexperimental_remote_repo_contents_cache query)
if [[ -n "${BUILDBUDDY_API_KEY:-}" ]]; then
bazel_query_args+=(
"--config=${ci_config}"
"--remote_header=x-buildbuddy-api-key=${BUILDBUDDY_API_KEY}"
)
fi
if [[ -n "${BAZEL_REPO_CONTENTS_CACHE:-}" ]]; then
bazel_query_args+=("--repo_contents_cache=${BAZEL_REPO_CONTENTS_CACHE}")
fi
if [[ -n "${BAZEL_REPOSITORY_CACHE:-}" ]]; then
bazel_query_args+=("--repository_cache=${BAZEL_REPOSITORY_CACHE}")
fi
bazel_query_args+=("${query_args[@]}" "$query_expression")
if (( ${#bazel_startup_args[@]} > 0 )); then
run_bazel "${bazel_startup_args[@]}" "${bazel_query_args[@]}"
else
run_bazel "${bazel_query_args[@]}"
fi

View File

@@ -8,25 +8,9 @@ FROM ubuntu:24.04
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl git python3 ca-certificates xz-utils && \
curl git python3 ca-certificates && \
rm -rf /var/lib/apt/lists/*
COPY codex-rs/node-version.txt /tmp/node-version.txt
RUN set -eux; \
node_arch="$(dpkg --print-architecture)"; \
case "${node_arch}" in \
amd64) node_dist_arch="x64" ;; \
arm64) node_dist_arch="arm64" ;; \
*) echo "unsupported architecture: ${node_arch}"; exit 1 ;; \
esac; \
node_version="$(tr -d '[:space:]' </tmp/node-version.txt)"; \
curl -fsSLO "https://nodejs.org/dist/v${node_version}/node-v${node_version}-linux-${node_dist_arch}.tar.xz"; \
tar -xJf "node-v${node_version}-linux-${node_dist_arch}.tar.xz" -C /usr/local --strip-components=1; \
rm "node-v${node_version}-linux-${node_dist_arch}.tar.xz" /tmp/node-version.txt; \
node --version; \
npm --version
# Install dotslash.
RUN curl -LSfs "https://github.com/facebook/dotslash/releases/download/v0.5.8/dotslash-ubuntu-22.04.$(uname -m).tar.gz" | tar fxz - -C /usr/local/bin

View File

@@ -17,6 +17,13 @@ concurrency:
cancel-in-progress: ${{ github.ref_name != 'main' }}
jobs:
test:
# Even though a no-cache-hit Windows build seems to exceed the 30-minute
# limit on occasion, the more common reason for exceeding the limit is a
# true test failure in a rust_test() marked "flaky" that gets run 3x.
# In that case, extra time generally does not give us more signal.
#
# Ultimately we need true distributed builds (e.g.,
# https://www.buildbuddy.io/docs/rbe-setup/) to speed things up.
timeout-minutes: 30
strategy:
fail-fast: false
@@ -85,7 +92,6 @@ jobs:
bazel_wrapper_args=(
--print-failed-test-logs
--use-node-test-env
)
bazel_test_args=(
test

View File

@@ -17,10 +17,10 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@631a55b12751854ce901bb631d5902ceb48146f7 # stable
uses: dtolnay/rust-toolchain@a0b273b48ed29de4470960879e8381ff45632f26 # 1.93.0
- name: Run cargo-deny
uses: EmbarkStudios/cargo-deny-action@82eb9f621fbc699dd0918f3ea06864c14cc84246 # v2
with:
rust-version: stable
rust-version: 1.93.0
manifest-path: ./codex-rs/Cargo.toml

View File

@@ -45,11 +45,16 @@ jobs:
GH_TOKEN: ${{ github.token }}
run: |
set -euo pipefail
# Use a rust-release version that includes all native binaries.
CODEX_VERSION=0.115.0
# Use a recent successful rust-release run that published the full
# cross-platform native payload required by the npm package layout.
# Passing the workflow URL directly avoids relying on old rust-v*
# branches remaining discoverable via `gh run list --branch ...`.
CODEX_VERSION=0.125.0
WORKFLOW_URL="https://github.com/openai/codex/actions/runs/24901475298"
OUTPUT_DIR="${RUNNER_TEMP}"
python3 ./scripts/stage_npm_packages.py \
--release-version "$CODEX_VERSION" \
--workflow-url "$WORKFLOW_URL" \
--package codex \
--output-dir "$OUTPUT_DIR"
PACK_OUTPUT="${OUTPUT_DIR}/codex-npm-${CODEX_VERSION}.tgz"

View File

@@ -61,7 +61,7 @@ jobs:
# .github/prompts/issue-deduplicator.txt file is obsolete and removed.
- id: codex-all
name: Find duplicates (pass 1, all issues)
uses: openai/codex-action@0b91f4a2703c23df3102c3f0967d3c6db34eedef # v1
uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02 # v1.7
with:
openai-api-key: ${{ secrets.CODEX_OPENAI_API_KEY }}
allow-users: "*"
@@ -195,7 +195,7 @@ jobs:
- id: codex-open
name: Find duplicates (pass 2, open issues)
uses: openai/codex-action@0b91f4a2703c23df3102c3f0967d3c6db34eedef # v1
uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02 # v1.7
with:
openai-api-key: ${{ secrets.CODEX_OPENAI_API_KEY }}
allow-users: "*"

View File

@@ -20,7 +20,7 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- id: codex
uses: openai/codex-action@0b91f4a2703c23df3102c3f0967d3c6db34eedef # v1
uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02 # v1.7
with:
openai-api-key: ${{ secrets.CODEX_OPENAI_API_KEY }}
allow-users: "*"

View File

@@ -76,6 +76,8 @@ jobs:
- name: Test argument comment lint package
working-directory: tools/argument-comment-lint
run: cargo test
env:
RUST_MIN_STACK: "8388608" # 8 MiB
argument_comment_lint_prebuilt:
name: Argument comment lint - ${{ matrix.name }}
@@ -558,10 +560,6 @@ jobs:
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Node.js for js_repl tests
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6
with:
node-version-file: codex-rs/node-version.txt
- name: Install Linux build dependencies
if: ${{ runner.os == 'Linux' }}
shell: bash
@@ -671,6 +669,7 @@ jobs:
run: cargo nextest run --no-fail-fast --target ${{ matrix.target }} --cargo-profile ci-test --timings
env:
RUST_BACKTRACE: 1
RUST_MIN_STACK: "8388608" # 8 MiB
NEXTEST_STATUS_LEVEL: leak
- name: Upload Cargo timings (nextest)

View File

@@ -41,6 +41,7 @@ jobs:
for f in "${files[@]}"; do
[[ $f == codex-rs/* ]] && codex=true
[[ $f == codex-rs/* || $f == tools/argument-comment-lint/* || $f == justfile ]] && argument_comment_lint=true
[[ $f == defs.bzl || $f == workspace_root_test_launcher.sh.tpl || $f == workspace_root_test_launcher.bat.tpl ]] && argument_comment_lint=true
[[ $f == tools/argument-comment-lint/* || $f == .github/workflows/rust-ci.yml || $f == .github/workflows/rust-ci-full.yml ]] && argument_comment_lint_package=true
[[ $f == .github/* ]] && workflows=true
done
@@ -130,13 +131,14 @@ jobs:
- name: Test argument comment lint package
working-directory: tools/argument-comment-lint
run: cargo test
env:
RUST_MIN_STACK: "8388608" # 8 MiB
argument_comment_lint_prebuilt:
name: Argument comment lint - ${{ matrix.name }}
runs-on: ${{ matrix.runs_on || matrix.runner }}
timeout-minutes: ${{ matrix.timeout_minutes }}
needs: changed
if: ${{ needs.changed.outputs.argument_comment_lint == 'true' || needs.changed.outputs.workflows == 'true' }}
strategy:
fail-fast: false
matrix:
@@ -154,43 +156,28 @@ jobs:
group: codex-runners
labels: codex-windows-x64
steps:
- name: Check whether argument comment lint should run
id: argument_comment_lint_gate
shell: bash
env:
ARGUMENT_COMMENT_LINT: ${{ needs.changed.outputs.argument_comment_lint }}
WORKFLOWS: ${{ needs.changed.outputs.workflows }}
run: |
if [[ "$ARGUMENT_COMMENT_LINT" == "true" || "$WORKFLOWS" == "true" ]]; then
echo "run=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "No argument-comment-lint relevant changes."
echo "run=false" >> "$GITHUB_OUTPUT"
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: ./.github/actions/setup-bazel-ci
if: ${{ steps.argument_comment_lint_gate.outputs.run == 'true' }}
- name: Run argument comment lint on codex-rs via Bazel
if: ${{ steps.argument_comment_lint_gate.outputs.run == 'true' }}
uses: ./.github/actions/run-argument-comment-lint
with:
target: ${{ runner.os }}
install-test-prereqs: true
- name: Install Linux sandbox build dependencies
if: ${{ runner.os == 'Linux' }}
shell: bash
run: |
sudo DEBIAN_FRONTEND=noninteractive apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends pkg-config libcap-dev
- name: Run argument comment lint on codex-rs via Bazel
if: ${{ runner.os != 'Windows' }}
env:
BUILDBUDDY_API_KEY: ${{ secrets.BUILDBUDDY_API_KEY }}
shell: bash
run: |
bazel_targets="$(./tools/argument-comment-lint/list-bazel-targets.sh)"
./.github/scripts/run-bazel-ci.sh \
-- \
build \
--config=argument-comment-lint \
--keep_going \
--build_metadata=COMMIT_SHA=${GITHUB_SHA} \
-- \
${bazel_targets}
- name: Run argument comment lint on codex-rs via Bazel
if: ${{ runner.os == 'Windows' }}
env:
BUILDBUDDY_API_KEY: ${{ secrets.BUILDBUDDY_API_KEY }}
shell: bash
run: |
./.github/scripts/run-argument-comment-lint-bazel.sh \
--config=argument-comment-lint \
--platforms=//:local_windows \
--keep_going \
--build_metadata=COMMIT_SHA=${GITHUB_SHA}
buildbuddy-api-key: ${{ secrets.BUILDBUDDY_API_KEY }}
# --- Gatherer job that you mark as the ONLY required status -----------------
results:

View File

@@ -24,7 +24,9 @@ jobs:
build-windows-binaries:
name: Build Windows binaries - ${{ matrix.runner }} - ${{ matrix.target }} - ${{ matrix.bundle }}
runs-on: ${{ matrix.runs_on }}
timeout-minutes: 60
# Windows release builds can exceed an hour on fat-LTO mainline releases,
# so keep the timeout aligned with the top-level release build headroom.
timeout-minutes: 90
permissions:
contents: read
defaults:
@@ -40,28 +42,42 @@ jobs:
- runner: windows-x64
target: x86_64-pc-windows-msvc
bundle: primary
build_args: --bin codex --bin codex-responses-api-proxy
binaries: "codex codex-responses-api-proxy"
runs_on:
group: codex-runners
labels: codex-windows-x64
- runner: windows-arm64
target: aarch64-pc-windows-msvc
bundle: primary
build_args: --bin codex --bin codex-responses-api-proxy
binaries: "codex codex-responses-api-proxy"
runs_on:
group: codex-runners
labels: codex-windows-arm64
- runner: windows-x64
target: x86_64-pc-windows-msvc
bundle: helpers
build_args: --bin codex-windows-sandbox-setup --bin codex-command-runner
binaries: "codex-windows-sandbox-setup codex-command-runner"
runs_on:
group: codex-runners
labels: codex-windows-x64
- runner: windows-arm64
target: aarch64-pc-windows-msvc
bundle: helpers
build_args: --bin codex-windows-sandbox-setup --bin codex-command-runner
binaries: "codex-windows-sandbox-setup codex-command-runner"
runs_on:
group: codex-runners
labels: codex-windows-arm64
- runner: windows-x64
target: x86_64-pc-windows-msvc
bundle: app-server
binaries: "codex-app-server"
runs_on:
group: codex-runners
labels: codex-windows-x64
- runner: windows-arm64
target: aarch64-pc-windows-msvc
bundle: app-server
binaries: "codex-app-server"
runs_on:
group: codex-runners
labels: codex-windows-arm64
@@ -89,7 +105,11 @@ jobs:
- name: Cargo build (Windows binaries)
shell: bash
run: |
cargo build --target ${{ matrix.target }} --release --timings ${{ matrix.build_args }}
build_args=()
for binary in ${{ matrix.binaries }}; do
build_args+=(--bin "$binary")
done
cargo build --target ${{ matrix.target }} --release --timings "${build_args[@]}"
- name: Upload Cargo timings
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
@@ -103,13 +123,9 @@ jobs:
run: |
output_dir="target/${{ matrix.target }}/release/staged-${{ matrix.bundle }}"
mkdir -p "$output_dir"
if [[ "${{ matrix.bundle }}" == "primary" ]]; then
cp target/${{ matrix.target }}/release/codex.exe "$output_dir/codex.exe"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy.exe "$output_dir/codex-responses-api-proxy.exe"
else
cp target/${{ matrix.target }}/release/codex-windows-sandbox-setup.exe "$output_dir/codex-windows-sandbox-setup.exe"
cp target/${{ matrix.target }}/release/codex-command-runner.exe "$output_dir/codex-command-runner.exe"
fi
for binary in ${{ matrix.binaries }}; do
cp "target/${{ matrix.target }}/release/${binary}.exe" "$output_dir/${binary}.exe"
done
- name: Upload Windows binaries
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
@@ -123,13 +139,15 @@ jobs:
- build-windows-binaries
name: Build - ${{ matrix.runner }} - ${{ matrix.target }}
runs-on: ${{ matrix.runs_on }}
timeout-minutes: 60
timeout-minutes: 90
permissions:
contents: read
id-token: write
defaults:
run:
working-directory: codex-rs
env:
WINDOWS_BINARIES: "codex codex-responses-api-proxy codex-windows-sandbox-setup codex-command-runner codex-app-server"
strategy:
fail-fast: false
@@ -161,19 +179,25 @@ jobs:
name: windows-binaries-${{ matrix.target }}-helpers
path: codex-rs/target/${{ matrix.target }}/release
- name: Download prebuilt Windows app-server binary
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8
with:
name: windows-binaries-${{ matrix.target }}-app-server
path: codex-rs/target/${{ matrix.target }}/release
- name: Verify binaries
shell: bash
run: |
set -euo pipefail
ls -lh target/${{ matrix.target }}/release/codex.exe
ls -lh target/${{ matrix.target }}/release/codex-responses-api-proxy.exe
ls -lh target/${{ matrix.target }}/release/codex-windows-sandbox-setup.exe
ls -lh target/${{ matrix.target }}/release/codex-command-runner.exe
for binary in ${WINDOWS_BINARIES}; do
ls -lh "target/${{ matrix.target }}/release/${binary}.exe"
done
- name: Sign Windows binaries with Azure Trusted Signing
uses: ./.github/actions/windows-code-sign
with:
target: ${{ matrix.target }}
binaries: ${{ env.WINDOWS_BINARIES }}
client-id: ${{ secrets.AZURE_TRUSTED_SIGNING_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TRUSTED_SIGNING_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_TRUSTED_SIGNING_SUBSCRIPTION_ID }}
@@ -187,10 +211,10 @@ jobs:
dest="dist/${{ matrix.target }}"
mkdir -p "$dest"
cp target/${{ matrix.target }}/release/codex.exe "$dest/codex-${{ matrix.target }}.exe"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy.exe "$dest/codex-responses-api-proxy-${{ matrix.target }}.exe"
cp target/${{ matrix.target }}/release/codex-windows-sandbox-setup.exe "$dest/codex-windows-sandbox-setup-${{ matrix.target }}.exe"
cp target/${{ matrix.target }}/release/codex-command-runner.exe "$dest/codex-command-runner-${{ matrix.target }}.exe"
for binary in ${WINDOWS_BINARIES}; do
cp "target/${{ matrix.target }}/release/${binary}.exe" \
"$dest/${binary}-${{ matrix.target }}.exe"
done
- name: Install DotSlash
uses: facebook/install-dotslash@1e4e7b3e07eaca387acb98f1d4720e0bee8dbb6a # v2

View File

@@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: dtolnay/rust-toolchain@c2b55edffaf41a251c410bb32bed22afefa800f1 # 1.92
- uses: dtolnay/rust-toolchain@a0b273b48ed29de4470960879e8381ff45632f26 # 1.93.0
- name: Validate tag matches Cargo.toml version
shell: bash
run: |
@@ -47,9 +47,11 @@ jobs:
build:
needs: tag-check
name: Build - ${{ matrix.runner }} - ${{ matrix.target }}
name: Build - ${{ matrix.runner }} - ${{ matrix.target }} - ${{ matrix.bundle }}
runs-on: ${{ matrix.runs_on || matrix.runner }}
timeout-minutes: 60
# Release builds can take a long time, so leave some headroom to avoid
# having to restart the full workflow due to a timeout.
timeout-minutes: 90
permissions:
contents: read
id-token: write
@@ -67,16 +69,53 @@ jobs:
include:
- runner: macos-15-xlarge
target: aarch64-apple-darwin
bundle: primary
artifact_name: aarch64-apple-darwin
binaries: "codex codex-responses-api-proxy"
build_dmg: "true"
- runner: macos-15-xlarge
target: aarch64-apple-darwin
bundle: app-server
artifact_name: aarch64-apple-darwin-app-server
binaries: "codex-app-server"
build_dmg: "false"
- runner: macos-15-xlarge
target: x86_64-apple-darwin
bundle: primary
artifact_name: x86_64-apple-darwin
binaries: "codex codex-responses-api-proxy"
build_dmg: "true"
- runner: macos-15-xlarge
target: x86_64-apple-darwin
bundle: app-server
artifact_name: x86_64-apple-darwin-app-server
binaries: "codex-app-server"
build_dmg: "false"
# Release artifacts intentionally ship MUSL-linked Linux binaries.
- runner: ubuntu-24.04
target: x86_64-unknown-linux-musl
bundle: primary
artifact_name: x86_64-unknown-linux-musl
binaries: "codex codex-responses-api-proxy"
build_dmg: "false"
- runner: ubuntu-24.04
target: x86_64-unknown-linux-gnu
target: x86_64-unknown-linux-musl
bundle: app-server
artifact_name: x86_64-unknown-linux-musl-app-server
binaries: "codex-app-server"
build_dmg: "false"
- runner: ubuntu-24.04-arm
target: aarch64-unknown-linux-musl
bundle: primary
artifact_name: aarch64-unknown-linux-musl
binaries: "codex codex-responses-api-proxy"
build_dmg: "false"
- runner: ubuntu-24.04-arm
target: aarch64-unknown-linux-gnu
target: aarch64-unknown-linux-musl
bundle: app-server
artifact_name: aarch64-unknown-linux-musl-app-server
binaries: "codex-app-server"
build_dmg: "false"
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -219,13 +258,17 @@ jobs:
- name: Cargo build
shell: bash
run: |
build_args=()
for binary in ${{ matrix.binaries }}; do
build_args+=(--bin "$binary")
done
echo "CARGO_PROFILE_RELEASE_LTO: ${CARGO_PROFILE_RELEASE_LTO}"
cargo build --target ${{ matrix.target }} --release --timings --bin codex --bin codex-responses-api-proxy
cargo build --target ${{ matrix.target }} --release --timings "${build_args[@]}"
- name: Upload Cargo timings
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
with:
name: cargo-timings-rust-release-${{ matrix.target }}
name: cargo-timings-rust-release-${{ matrix.target }}-${{ matrix.bundle }}
path: codex-rs/target/**/cargo-timings/cargo-timing.html
if-no-files-found: warn
@@ -235,12 +278,14 @@ jobs:
with:
target: ${{ matrix.target }}
artifacts-dir: ${{ github.workspace }}/codex-rs/target/${{ matrix.target }}/release
binaries: ${{ matrix.binaries }}
- if: ${{ runner.os == 'macOS' }}
name: MacOS code signing (binaries)
uses: ./.github/actions/macos-code-sign
with:
target: ${{ matrix.target }}
binaries: ${{ matrix.binaries }}
sign-binaries: "true"
sign-dmg: "false"
apple-certificate: ${{ secrets.APPLE_CERTIFICATE_P12 }}
@@ -249,7 +294,7 @@ jobs:
apple-notarization-key-id: ${{ secrets.APPLE_NOTARIZATION_KEY_ID }}
apple-notarization-issuer-id: ${{ secrets.APPLE_NOTARIZATION_ISSUER_ID }}
- if: ${{ runner.os == 'macOS' }}
- if: ${{ runner.os == 'macOS' && matrix.build_dmg == 'true' }}
name: Build macOS dmg
shell: bash
run: |
@@ -264,23 +309,17 @@ jobs:
# The previous "MacOS code signing (binaries)" step signs + notarizes the
# built artifacts in `${release_dir}`. This step packages *those same*
# signed binaries into a dmg.
codex_binary_path="${release_dir}/codex"
proxy_binary_path="${release_dir}/codex-responses-api-proxy"
rm -rf "$dmg_root"
mkdir -p "$dmg_root"
if [[ ! -f "$codex_binary_path" ]]; then
echo "Binary $codex_binary_path not found"
exit 1
fi
if [[ ! -f "$proxy_binary_path" ]]; then
echo "Binary $proxy_binary_path not found"
exit 1
fi
ditto "$codex_binary_path" "${dmg_root}/codex"
ditto "$proxy_binary_path" "${dmg_root}/codex-responses-api-proxy"
for binary in ${{ matrix.binaries }}; do
binary_path="${release_dir}/${binary}"
if [[ ! -f "${binary_path}" ]]; then
echo "Binary ${binary_path} not found"
exit 1
fi
ditto "${binary_path}" "${dmg_root}/${binary}"
done
rm -f "$dmg_path"
hdiutil create \
@@ -295,7 +334,7 @@ jobs:
exit 1
fi
- if: ${{ runner.os == 'macOS' }}
- if: ${{ runner.os == 'macOS' && matrix.build_dmg == 'true' }}
name: MacOS code signing (dmg)
uses: ./.github/actions/macos-code-sign
with:
@@ -314,15 +353,15 @@ jobs:
dest="dist/${{ matrix.target }}"
mkdir -p "$dest"
cp target/${{ matrix.target }}/release/codex "$dest/codex-${{ matrix.target }}"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy "$dest/codex-responses-api-proxy-${{ matrix.target }}"
for binary in ${{ matrix.binaries }}; do
cp "target/${{ matrix.target }}/release/${binary}" "$dest/${binary}-${{ matrix.target }}"
if [[ "${{ matrix.target }}" == *linux* ]]; then
cp "target/${{ matrix.target }}/release/${binary}.sigstore" \
"$dest/${binary}-${{ matrix.target }}.sigstore"
fi
done
if [[ "${{ matrix.target }}" == *linux* ]]; then
cp target/${{ matrix.target }}/release/codex.sigstore "$dest/codex-${{ matrix.target }}.sigstore"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy.sigstore "$dest/codex-responses-api-proxy-${{ matrix.target }}.sigstore"
fi
if [[ "${{ matrix.target }}" == *apple-darwin ]]; then
if [[ "${{ matrix.build_dmg }}" == "true" ]]; then
cp target/${{ matrix.target }}/release/codex-${{ matrix.target }}.dmg "$dest/codex-${{ matrix.target }}.dmg"
fi
@@ -364,7 +403,7 @@ jobs:
- uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
with:
name: ${{ matrix.target }}
name: ${{ matrix.artifact_name }}
# Upload the per-binary .zst files as well as the new .tar.gz
# equivalents we generated in the previous step.
path: |
@@ -614,11 +653,59 @@ jobs:
prefix="${NPM_TAG}-"
fi
root_tarball="dist/npm/codex-npm-${VERSION}.tgz"
sdk_tarball="dist/npm/codex-sdk-npm-${VERSION}.tgz"
# Keep this list in sync with CODEX_PLATFORM_PACKAGES in
# codex-cli/scripts/build_npm_package.py. The root wrapper advances
# @openai/codex@latest as soon as it publishes, so every platform
# package it aliases must already exist in the registry first.
platform_tarballs=(
"dist/npm/codex-npm-linux-x64-${VERSION}.tgz"
"dist/npm/codex-npm-linux-arm64-${VERSION}.tgz"
"dist/npm/codex-npm-darwin-x64-${VERSION}.tgz"
"dist/npm/codex-npm-darwin-arm64-${VERSION}.tgz"
"dist/npm/codex-npm-win32-x64-${VERSION}.tgz"
"dist/npm/codex-npm-win32-arm64-${VERSION}.tgz"
)
for required_tarball in "${platform_tarballs[@]}" "${root_tarball}"; do
if [[ ! -f "${required_tarball}" ]]; then
echo "Missing npm tarball: ${required_tarball}"
exit 1
fi
done
shopt -s nullglob
tarballs=(dist/npm/*-"${VERSION}".tgz)
if [[ ${#tarballs[@]} -eq 0 ]]; then
echo "No npm tarballs found in dist/npm for version ${VERSION}"
exit 1
other_tarballs=()
for tarball in dist/npm/*-"${VERSION}".tgz; do
if [[ "${tarball}" == "${root_tarball}" || "${tarball}" == "${sdk_tarball}" ]]; then
continue
fi
is_platform_tarball=false
for platform_tarball in "${platform_tarballs[@]}"; do
if [[ "${tarball}" == "${platform_tarball}" ]]; then
is_platform_tarball=true
break
fi
done
if [[ "${is_platform_tarball}" == true ]]; then
continue
fi
other_tarballs+=("${tarball}")
done
# Publish the platform packages before the root CLI wrapper. The root
# wrapper advances @openai/codex@latest, so it should only publish
# after the optional dependency versions it references exist.
tarballs=(
"${platform_tarballs[@]}"
"${other_tarballs[@]}"
"${root_tarball}"
)
if [[ -f "${sdk_tarball}" ]]; then
tarballs+=("${sdk_tarball}")
fi
for tarball in "${tarballs[@]}"; do

View File

@@ -78,7 +78,9 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Bazel
uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3
uses: ./.github/actions/setup-bazel-ci
with:
target: ${{ matrix.target }}
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6

View File

@@ -3,6 +3,7 @@ name: v8-canary
on:
pull_request:
paths:
- ".github/actions/setup-bazel-ci/**"
- ".github/scripts/rusty_v8_bazel.py"
- ".github/workflows/rusty-v8-release.yml"
- ".github/workflows/v8-canary.yml"
@@ -16,6 +17,7 @@ on:
branches:
- main
paths:
- ".github/actions/setup-bazel-ci/**"
- ".github/scripts/rusty_v8_bazel.py"
- ".github/workflows/rusty-v8-release.yml"
- ".github/workflows/v8-canary.yml"
@@ -75,7 +77,9 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Bazel
uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3
uses: ./.github/actions/setup-bazel-ci
with:
target: ${{ matrix.target }}
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6

2
.gitignore vendored
View File

@@ -52,6 +52,7 @@ yarn-error.log*
# env
.env*
!.env.example
.venv/
# package
*.tgz
@@ -91,4 +92,3 @@ CHANGELOG.ignore.md
# Python bytecode files
__pycache__/
*.pyc

View File

@@ -19,6 +19,12 @@ In the codex-rs folder where the rust code lives:
- You can run `just argument-comment-lint` to run the lint check locally. This is powered by Bazel, so running it the first time can be slow if Bazel is not warmed up, though incremental invocations should take <15s. Most of the time, it is best to update the PR and let CI take responsibility for checking this (or run it asynchronously in the background after submitting the PR). Note CI checks all three platforms, which the local run does not.
- When possible, make `match` statements exhaustive and avoid wildcard arms.
- Newly added traits should include doc comments that explain their role and how implementations are expected to use them.
- Discourage both `#[async_trait]` and `#[allow(async_fn_in_trait)]` in Rust traits.
- Prefer native RPITIT trait methods with explicit `Send` bounds on the returned future, as in `3c7f013f9735` / `#16630`.
- Preferred trait shape:
`fn foo(&self, ...) -> impl std::future::Future<Output = T> + Send;`
- Implementations may still use `async fn foo(&self, ...) -> T` when they satisfy that contract.
- Do not use `#[allow(async_fn_in_trait)]` as a shortcut around spelling the future contract explicitly.
- When writing tests, prefer comparing the equality of entire objects over fields one by one.
- When making a change that adds or changes an API, ensure that the documentation in the `docs/` folder is up to date if applicable.
- Prefer private modules and explicitly exported public crate API.

43
MODULE.bazel.lock generated

File diff suppressed because one or more lines are too long

3
NOTICE
View File

@@ -4,6 +4,3 @@ Copyright 2025 OpenAI
This project includes code derived from [Ratatui](https://github.com/ratatui/ratatui), licensed under the MIT license.
Copyright (c) 2016-2022 Florian Dehau
Copyright (c) 2023-2025 The Ratatui Developers
This project includes Meriyah parser assets from [meriyah](https://github.com/meriyah/meriyah), licensed under the ISC license.
Copyright (c) 2019 and later, KFlash and others.

View File

@@ -1 +0,0 @@
node_modules/

View File

@@ -1,59 +0,0 @@
FROM node:24-slim
ARG TZ
ENV TZ="$TZ"
# Install basic development tools, ca-certificates, and iptables/ipset, then clean up apt cache to reduce image size
RUN apt-get update && apt-get install -y --no-install-recommends \
aggregate \
ca-certificates \
curl \
dnsutils \
fzf \
gh \
git \
gnupg2 \
iproute2 \
ipset \
iptables \
jq \
less \
man-db \
procps \
unzip \
ripgrep \
zsh \
&& rm -rf /var/lib/apt/lists/*
# Ensure default node user has access to /usr/local/share
RUN mkdir -p /usr/local/share/npm-global && \
chown -R node:node /usr/local/share
ARG USERNAME=node
# Set up non-root user
USER node
# Install global packages
ENV NPM_CONFIG_PREFIX=/usr/local/share/npm-global
ENV PATH=$PATH:/usr/local/share/npm-global/bin
# Install codex
COPY dist/codex.tgz codex.tgz
RUN npm install -g codex.tgz \
&& npm cache clean --force \
&& rm -rf /usr/local/share/npm-global/lib/node_modules/codex-cli/node_modules/.cache \
&& rm -rf /usr/local/share/npm-global/lib/node_modules/codex-cli/tests \
&& rm -rf /usr/local/share/npm-global/lib/node_modules/codex-cli/docs
# Inside the container we consider the environment already sufficiently locked
# down, therefore instruct Codex CLI to allow running without sandboxing.
ENV CODEX_UNSAFE_ALLOW_NO_SANDBOX=1
# Copy and set up firewall script as root.
USER root
COPY scripts/init_firewall.sh /usr/local/bin/
RUN chmod 500 /usr/local/bin/init_firewall.sh
# Drop back to non-root.
USER node

View File

@@ -18,5 +18,5 @@
"url": "git+https://github.com/openai/codex.git",
"directory": "codex-cli"
},
"packageManager": "pnpm@10.29.3+sha512.498e1fb4cca5aa06c1dcf2611e6fafc50972ffe7189998c409e90de74566444298ffe43e6cd2acdc775ba1aa7cc5e092a8b7054c811ba8c5770f84693d33d2dc"
"packageManager": "pnpm@10.33.0+sha512.10568bb4a6afb58c9eb3630da90cc9516417abebd3fabbe6739f0ae795728da1491e9db5a544c76ad8eb7570f5c4bb3d6c637b2cb41bfdcdb47fa823c8649319"
}

View File

@@ -1,16 +0,0 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR=$(realpath "$(dirname "$0")")
trap "popd >> /dev/null" EXIT
pushd "$SCRIPT_DIR/.." >> /dev/null || {
echo "Error: Failed to change directory to $SCRIPT_DIR/.."
exit 1
}
pnpm install
pnpm run build
rm -rf ./dist/openai-codex-*.tgz
pnpm pack --pack-destination ./dist
mv ./dist/openai-codex-*.tgz ./dist/codex.tgz
docker build -t codex -f "./Dockerfile" .

View File

@@ -92,4 +92,4 @@ quoted_args=""
for arg in "$@"; do
quoted_args+=" $(printf '%q' "$arg")"
done
docker exec -it "$CONTAINER_NAME" bash -c "cd \"/app$WORK_DIR\" && codex --full-auto ${quoted_args}"
docker exec -it "$CONTAINER_NAME" bash -c "cd \"/app$WORK_DIR\" && codex --sandbox workspace-write --ask-for-approval on-request ${quoted_args}"

View File

@@ -6,5 +6,6 @@ ignore = [
"RUSTSEC-2024-0436", # paste 1.0.15 via starlark/ratatui; upstream crate is unmaintained
"RUSTSEC-2024-0320", # yaml-rust via syntect; remove when syntect drops or updates it
"RUSTSEC-2025-0141", # bincode via syntect; remove when syntect drops or updates it
"RUSTSEC-2026-0097", # rand 0.8.5 via age/codex-secrets and zbus/keyring; remove when transitive deps move to rand >=0.9.3
"RUSTSEC-2026-0118", # hickory-proto via rama-dns/rama-tcp; remove when rama updates to hickory 0.26.1 or hickory-net
"RUSTSEC-2026-0119", # hickory-proto via rama-dns/rama-tcp; remove when rama updates to hickory 0.26.1 or hickory-net
]

View File

@@ -8,6 +8,11 @@ max-threads = 1
[test-groups.app_server_integration]
max-threads = 1
[test-groups.core_apply_patch_cli_integration]
max-threads = 1
[test-groups.windows_sandbox_legacy_sessions]
max-threads = 1
[[profile.default.overrides]]
# Do not add new tests here
@@ -27,3 +32,15 @@ test-group = 'app_server_protocol_codegen'
# Keep the library unit tests parallel.
filter = 'package(codex-app-server) & kind(test)'
test-group = 'app_server_integration'
[[profile.default.overrides]]
# These tests exercise full Codex turns and apply_patch execution, and they are
# sensitive to Windows runner process-startup stalls when many cases launch at once.
filter = 'package(codex-core) & kind(test) & test(apply_patch_cli)'
test-group = 'core_apply_patch_cli_integration'
[[profile.default.overrides]]
# These tests create restricted-token Windows child processes and private desktops.
# Serialize them to avoid exhausting Windows session/global desktop resources in CI.
filter = 'package(codex-windows-sandbox) & test(legacy_)'
test-group = 'windows_sandbox_legacy_sessions'

View File

@@ -17,7 +17,7 @@ jobs:
working-directory: codex-rs
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: dtolnay/rust-toolchain@a0b273b48ed29de4470960879e8381ff45632f26 # 1.93.0
- name: Install cargo-audit
uses: taiki-e/install-action@v2
with:

View File

@@ -1,6 +1,5 @@
exports_files([
"clippy.toml",
"node-version.txt",
])
filegroup(

956
codex-rs/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,10 +1,14 @@
[workspace]
members = [
"aws-auth",
"analytics",
"agent-graph-store",
"agent-identity",
"backend-client",
"ansi-escape",
"async-utils",
"app-server",
"app-server-transport",
"app-server-client",
"app-server-protocol",
"app-server-test-client",
@@ -24,18 +28,23 @@ members = [
"collaboration-mode-templates",
"connectors",
"config",
"device-key",
"shell-command",
"shell-escalation",
"skills",
"core",
"core-api",
"core-plugins",
"core-skills",
"hooks",
"secrets",
"exec",
"file-system",
"exec-server",
"execpolicy",
"execpolicy-legacy",
"external-agent-migration",
"external-agent-sessions",
"keyring-store",
"file-search",
"linux-sandbox",
@@ -43,6 +52,8 @@ members = [
"login",
"codex-mcp",
"mcp-server",
"memories/read",
"memories/write",
"model-provider-info",
"models-manager",
"network-proxy",
@@ -51,6 +62,7 @@ members = [
"protocol",
"realtime-webrtc",
"rollout",
"rollout-trace",
"rmcp-client",
"responses-api-proxy",
"response-debug-context",
@@ -88,6 +100,7 @@ members = [
"state",
"terminal-detection",
"test-binary-support",
"thread-manager-sample",
"thread-store",
"uds",
"codex-experimental-api-macros",
@@ -109,9 +122,13 @@ license = "Apache-2.0"
# Internal
app_test_support = { path = "app-server/tests/common" }
codex-analytics = { path = "analytics" }
codex-agent-graph-store = { path = "agent-graph-store" }
codex-agent-identity = { path = "agent-identity" }
codex-ansi-escape = { path = "ansi-escape" }
codex-api = { path = "codex-api" }
codex-aws-auth = { path = "aws-auth" }
codex-app-server = { path = "app-server" }
codex-app-server-transport = { path = "app-server-transport" }
codex-app-server-client = { path = "app-server-client" }
codex-app-server-protocol = { path = "app-server-protocol" }
codex-app-server-test-client = { path = "app-server-test-client" }
@@ -130,11 +147,16 @@ codex-code-mode = { path = "code-mode" }
codex-config = { path = "config" }
codex-connectors = { path = "connectors" }
codex-core = { path = "core" }
codex-core-api = { path = "core-api" }
codex-core-plugins = { path = "core-plugins" }
codex-core-skills = { path = "core-skills" }
codex-device-key = { path = "device-key" }
codex-exec = { path = "exec" }
codex-file-system = { path = "file-system" }
codex-exec-server = { path = "exec-server" }
codex-execpolicy = { path = "execpolicy" }
codex-external-agent-migration = { path = "external-agent-migration" }
codex-external-agent-sessions = { path = "external-agent-sessions" }
codex-experimental-api-macros = { path = "codex-experimental-api-macros" }
codex-features = { path = "features" }
codex-feedback = { path = "feedback" }
@@ -146,6 +168,8 @@ codex-keyring-store = { path = "keyring-store" }
codex-linux-sandbox = { path = "linux-sandbox" }
codex-lmstudio = { path = "lmstudio" }
codex-login = { path = "login" }
codex-memories-read = { path = "memories/read" }
codex-memories-write = { path = "memories/write" }
codex-mcp = { path = "codex-mcp" }
codex-mcp-server = { path = "mcp-server" }
codex-model-provider-info = { path = "model-provider-info" }
@@ -162,6 +186,7 @@ codex-responses-api-proxy = { path = "responses-api-proxy" }
codex-response-debug-context = { path = "response-debug-context" }
codex-rmcp-client = { path = "rmcp-client" }
codex-rollout = { path = "rollout" }
codex-rollout-trace = { path = "rollout-trace" }
codex-sandboxing = { path = "sandboxing" }
codex-secrets = { path = "secrets" }
codex-shell-command = { path = "shell-command" }
@@ -215,6 +240,10 @@ async-channel = "2.3.1"
async-io = "2.6.0"
async-stream = "0.3.6"
async-trait = "0.1.89"
aws-config = "1"
aws-credential-types = "1"
aws-sigv4 = "1"
aws-types = "1"
axum = { version = "0.8", default-features = false }
base64 = "0.22.1"
bm25 = "2.3.2"
@@ -226,8 +255,8 @@ clap_complete = "4"
color-eyre = "0.6.3"
constant_time_eq = "0.3.1"
crossbeam-channel = "0.5.15"
crossterm = "0.28.1"
crypto_box = { version = "0.9.1", features = ["seal"] }
crossterm = "0.28.1"
csv = "1.3.1"
ctor = "0.6.3"
deno_core_icudata = "0.77.0"
@@ -242,6 +271,7 @@ encoding_rs = "0.8.35"
env-flags = "0.1.1"
env_logger = "0.11.9"
eventsource-stream = "0.2.3"
flate2 = "1.1.8"
futures = { version = "0.3", default-features = false }
gethostname = "1.1.0"
gix = { version = "0.81.0", default-features = false, features = ["sha1"] }
@@ -283,6 +313,7 @@ os_info = "3.12.0"
owo-colors = "4.3.0"
path-absolutize = "3.1.1"
pathdiff = "0.2"
p256 = "0.13.2"
portable-pty = "0.9.0"
predicates = "3"
pretty_assertions = "1.4.1"
@@ -293,7 +324,7 @@ ratatui = "0.29.0"
ratatui-macros = "0.6.0"
regex = "1.12.3"
regex-lite = "0.1.8"
reqwest = "0.12"
reqwest = { version = "0.12", features = ["cookies"] }
rmcp = { version = "0.15.0", default-features = false }
runfiles = { git = "https://github.com/dzbarsky/rules_rust", rev = "b56cbaa8465e74127f1ea216f813cd377295ad81" }
rustls = { version = "0.23", default-features = false, features = [
@@ -333,6 +364,7 @@ strum_macros = "0.28.0"
supports-color = "3.0.2"
syntect = "5"
sys-locale = "0.3.2"
tar = { version = "=0.4.45", default-features = false }
tempfile = "3.23.0"
test-log = "0.2.19"
textwrap = "0.16.2"
@@ -424,6 +456,7 @@ unwrap_used = "deny"
# silence the false positive here instead of deleting a real dependency.
[workspace.metadata.cargo-shear]
ignored = [
"codex-agent-graph-store",
"icu_provider",
"openssl-sys",
"codex-utils-readiness",

View File

@@ -59,19 +59,22 @@ To test to see what happens when a command is run under the sandbox provided by
```
# macOS
codex sandbox macos [--full-auto] [--log-denials] [COMMAND]...
codex sandbox macos [--log-denials] [COMMAND]...
# Linux
codex sandbox linux [--full-auto] [COMMAND]...
codex sandbox linux [COMMAND]...
# Windows
codex sandbox windows [--full-auto] [COMMAND]...
codex sandbox windows [COMMAND]...
# Legacy aliases
codex debug seatbelt [--full-auto] [--log-denials] [COMMAND]...
codex debug landlock [--full-auto] [COMMAND]...
codex debug seatbelt [--log-denials] [COMMAND]...
codex debug landlock [COMMAND]...
```
To try a writable legacy sandbox mode with these commands, pass an explicit config override such
as `-c 'sandbox_mode="workspace-write"'`.
### Selecting a sandbox policy via `--sandbox`
The Rust CLI exposes a dedicated `--sandbox` (`-s`) flag that lets you pick the sandbox policy **without** having to reach for the generic `-c/--config` option:
@@ -94,7 +97,7 @@ In `workspace-write`, Codex also includes `~/.codex/memories` in its writable ro
This folder is the root of a Cargo workspace. It contains quite a bit of experimental code, but here are the key crates:
- [`core/`](./core) contains the business logic for Codex. Ultimately, we hope this to be a library crate that is generally useful for building other Rust/native applications that use Codex.
- [`core/`](./core) contains the business logic for Codex. Ultimately, we hope this becomes a library crate that is generally useful for building other Rust/native applications that use Codex.
- [`exec/`](./exec) "headless" CLI for use in automation.
- [`tui/`](./tui) CLI that launches a fullscreen TUI built with [Ratatui](https://ratatui.rs/).
- [`cli/`](./cli) CLI multitool that provides the aforementioned CLIs via subcommands.

View File

@@ -0,0 +1,6 @@
load("//:defs.bzl", "codex_rust_crate")
codex_rust_crate(
name = "agent-graph-store",
crate_name = "codex_agent_graph_store",
)

View File

@@ -0,0 +1,25 @@
[package]
edition.workspace = true
license.workspace = true
name = "codex-agent-graph-store"
version.workspace = true
[lib]
name = "codex_agent_graph_store"
path = "src/lib.rs"
[lints]
workspace = true
[dependencies]
async-trait = { workspace = true }
codex-protocol = { workspace = true }
codex-state = { workspace = true }
serde = { workspace = true, features = ["derive"] }
thiserror = { workspace = true }
[dev-dependencies]
pretty_assertions = { workspace = true }
serde_json = { workspace = true }
tempfile = { workspace = true }
tokio = { workspace = true, features = ["macros", "rt-multi-thread"] }

View File

@@ -0,0 +1,20 @@
/// Result type returned by agent graph store operations.
pub type AgentGraphStoreResult<T> = Result<T, AgentGraphStoreError>;
/// Error type shared by agent graph store implementations.
#[derive(Debug, thiserror::Error)]
pub enum AgentGraphStoreError {
/// The caller supplied invalid request data.
#[error("invalid agent graph store request: {message}")]
InvalidRequest {
/// User-facing explanation of the invalid request.
message: String,
},
/// Catch-all for implementation failures that do not fit a more specific category.
#[error("agent graph store internal error: {message}")]
Internal {
/// User-facing explanation of the implementation failure.
message: String,
},
}

View File

@@ -0,0 +1,12 @@
//! Storage-neutral parent/child topology for thread-spawned agents.
mod error;
mod local;
mod store;
mod types;
pub use error::AgentGraphStoreError;
pub use error::AgentGraphStoreResult;
pub use local::LocalAgentGraphStore;
pub use store::AgentGraphStore;
pub use types::ThreadSpawnEdgeStatus;

View File

@@ -0,0 +1,325 @@
use async_trait::async_trait;
use codex_protocol::ThreadId;
use codex_state::StateRuntime;
use std::sync::Arc;
use crate::AgentGraphStore;
use crate::AgentGraphStoreError;
use crate::AgentGraphStoreResult;
use crate::ThreadSpawnEdgeStatus;
/// SQLite-backed implementation of [`AgentGraphStore`] using an existing state runtime.
#[derive(Clone)]
pub struct LocalAgentGraphStore {
state_db: Arc<StateRuntime>,
}
impl std::fmt::Debug for LocalAgentGraphStore {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("LocalAgentGraphStore")
.field("codex_home", &self.state_db.codex_home())
.finish_non_exhaustive()
}
}
impl LocalAgentGraphStore {
/// Create a local graph store from an already-initialized state runtime.
pub fn new(state_db: Arc<StateRuntime>) -> Self {
Self { state_db }
}
}
#[async_trait]
impl AgentGraphStore for LocalAgentGraphStore {
async fn upsert_thread_spawn_edge(
&self,
parent_thread_id: ThreadId,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()> {
self.state_db
.upsert_thread_spawn_edge(parent_thread_id, child_thread_id, to_state_status(status))
.await
.map_err(internal_error)
}
async fn set_thread_spawn_edge_status(
&self,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()> {
self.state_db
.set_thread_spawn_edge_status(child_thread_id, to_state_status(status))
.await
.map_err(internal_error)
}
async fn list_thread_spawn_children(
&self,
parent_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>> {
if let Some(status) = status_filter {
return self
.state_db
.list_thread_spawn_children_with_status(parent_thread_id, to_state_status(status))
.await
.map_err(internal_error);
}
self.state_db
.list_thread_spawn_children(parent_thread_id)
.await
.map_err(internal_error)
}
async fn list_thread_spawn_descendants(
&self,
root_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>> {
match status_filter {
Some(status) => self
.state_db
.list_thread_spawn_descendants_with_status(root_thread_id, to_state_status(status))
.await
.map_err(internal_error),
None => self
.state_db
.list_thread_spawn_descendants(root_thread_id)
.await
.map_err(internal_error),
}
}
}
fn to_state_status(status: ThreadSpawnEdgeStatus) -> codex_state::DirectionalThreadSpawnEdgeStatus {
match status {
ThreadSpawnEdgeStatus::Open => codex_state::DirectionalThreadSpawnEdgeStatus::Open,
ThreadSpawnEdgeStatus::Closed => codex_state::DirectionalThreadSpawnEdgeStatus::Closed,
}
}
fn internal_error(err: impl std::fmt::Display) -> AgentGraphStoreError {
AgentGraphStoreError::Internal {
message: err.to_string(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use codex_state::DirectionalThreadSpawnEdgeStatus;
use pretty_assertions::assert_eq;
use tempfile::TempDir;
struct TestRuntime {
state_db: Arc<StateRuntime>,
_codex_home: TempDir,
}
fn thread_id(suffix: u128) -> ThreadId {
ThreadId::from_string(&format!("00000000-0000-0000-0000-{suffix:012}"))
.expect("valid thread id")
}
async fn state_runtime() -> TestRuntime {
let codex_home = TempDir::new().expect("tempdir should be created");
let state_db =
StateRuntime::init(codex_home.path().to_path_buf(), "test-provider".to_string())
.await
.expect("state db should initialize");
TestRuntime {
state_db,
_codex_home: codex_home,
}
}
#[tokio::test]
async fn local_store_upserts_and_lists_direct_children_with_status_filters() {
let fixture = state_runtime().await;
let state_db = fixture.state_db;
let store = LocalAgentGraphStore::new(state_db.clone());
let parent_thread_id = thread_id(/*suffix*/ 1);
let first_child_thread_id = thread_id(/*suffix*/ 2);
let second_child_thread_id = thread_id(/*suffix*/ 3);
store
.upsert_thread_spawn_edge(
parent_thread_id,
second_child_thread_id,
ThreadSpawnEdgeStatus::Closed,
)
.await
.expect("closed child edge should insert");
store
.upsert_thread_spawn_edge(
parent_thread_id,
first_child_thread_id,
ThreadSpawnEdgeStatus::Open,
)
.await
.expect("open child edge should insert");
let all_children = store
.list_thread_spawn_children(parent_thread_id, /*status_filter*/ None)
.await
.expect("all children should load");
assert_eq!(
all_children,
vec![first_child_thread_id, second_child_thread_id]
);
let open_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Open))
.await
.expect("open children should load");
let state_open_children = state_db
.list_thread_spawn_children_with_status(
parent_thread_id,
DirectionalThreadSpawnEdgeStatus::Open,
)
.await
.expect("state open children should load");
assert_eq!(open_children, state_open_children);
assert_eq!(open_children, vec![first_child_thread_id]);
let closed_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Closed))
.await
.expect("closed children should load");
assert_eq!(closed_children, vec![second_child_thread_id]);
}
#[tokio::test]
async fn local_store_updates_edge_status() {
let fixture = state_runtime().await;
let state_db = fixture.state_db;
let store = LocalAgentGraphStore::new(state_db);
let parent_thread_id = thread_id(/*suffix*/ 10);
let child_thread_id = thread_id(/*suffix*/ 11);
store
.upsert_thread_spawn_edge(
parent_thread_id,
child_thread_id,
ThreadSpawnEdgeStatus::Open,
)
.await
.expect("child edge should insert");
store
.set_thread_spawn_edge_status(child_thread_id, ThreadSpawnEdgeStatus::Closed)
.await
.expect("child edge should close");
let open_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Open))
.await
.expect("open children should load");
assert_eq!(open_children, Vec::<ThreadId>::new());
let closed_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Closed))
.await
.expect("closed children should load");
assert_eq!(closed_children, vec![child_thread_id]);
}
#[tokio::test]
async fn local_store_lists_descendants_breadth_first_with_status_filters() {
let fixture = state_runtime().await;
let state_db = fixture.state_db;
let store = LocalAgentGraphStore::new(state_db.clone());
let root_thread_id = thread_id(/*suffix*/ 20);
let later_child_thread_id = thread_id(/*suffix*/ 22);
let earlier_child_thread_id = thread_id(/*suffix*/ 21);
let closed_grandchild_thread_id = thread_id(/*suffix*/ 23);
let open_grandchild_thread_id = thread_id(/*suffix*/ 24);
let closed_child_thread_id = thread_id(/*suffix*/ 25);
let closed_great_grandchild_thread_id = thread_id(/*suffix*/ 26);
for (parent_thread_id, child_thread_id, status) in [
(
root_thread_id,
later_child_thread_id,
ThreadSpawnEdgeStatus::Open,
),
(
root_thread_id,
earlier_child_thread_id,
ThreadSpawnEdgeStatus::Open,
),
(
earlier_child_thread_id,
open_grandchild_thread_id,
ThreadSpawnEdgeStatus::Open,
),
(
later_child_thread_id,
closed_grandchild_thread_id,
ThreadSpawnEdgeStatus::Closed,
),
(
root_thread_id,
closed_child_thread_id,
ThreadSpawnEdgeStatus::Closed,
),
(
closed_child_thread_id,
closed_great_grandchild_thread_id,
ThreadSpawnEdgeStatus::Closed,
),
] {
store
.upsert_thread_spawn_edge(parent_thread_id, child_thread_id, status)
.await
.expect("edge should insert");
}
let all_descendants = store
.list_thread_spawn_descendants(root_thread_id, /*status_filter*/ None)
.await
.expect("all descendants should load");
assert_eq!(
all_descendants,
vec![
earlier_child_thread_id,
later_child_thread_id,
closed_child_thread_id,
closed_grandchild_thread_id,
open_grandchild_thread_id,
closed_great_grandchild_thread_id,
]
);
let open_descendants = store
.list_thread_spawn_descendants(root_thread_id, Some(ThreadSpawnEdgeStatus::Open))
.await
.expect("open descendants should load");
let state_open_descendants = state_db
.list_thread_spawn_descendants_with_status(
root_thread_id,
DirectionalThreadSpawnEdgeStatus::Open,
)
.await
.expect("state open descendants should load");
assert_eq!(open_descendants, state_open_descendants);
assert_eq!(
open_descendants,
vec![
earlier_child_thread_id,
later_child_thread_id,
open_grandchild_thread_id,
]
);
let closed_descendants = store
.list_thread_spawn_descendants(root_thread_id, Some(ThreadSpawnEdgeStatus::Closed))
.await
.expect("closed descendants should load");
assert_eq!(
closed_descendants,
vec![closed_child_thread_id, closed_great_grandchild_thread_id]
);
}
}

View File

@@ -0,0 +1,55 @@
use async_trait::async_trait;
use codex_protocol::ThreadId;
use crate::AgentGraphStoreResult;
use crate::ThreadSpawnEdgeStatus;
/// Storage-neutral boundary for persisted thread-spawn parent/child topology.
///
/// Implementations are expected to return stable ordering for list methods so callers can merge
/// persisted graph state with live in-memory state without introducing nondeterministic output.
#[async_trait]
pub trait AgentGraphStore: Send + Sync {
/// Insert or replace the directional parent/child edge for a spawned thread.
///
/// `child_thread_id` has at most one persisted parent. Re-inserting the same child should
/// update both the parent and status to match the supplied values.
async fn upsert_thread_spawn_edge(
&self,
parent_thread_id: ThreadId,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()>;
/// Update the persisted lifecycle status of a spawned thread's incoming edge.
///
/// Implementations should treat missing children as a successful no-op.
async fn set_thread_spawn_edge_status(
&self,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()>;
/// List direct spawned children of a parent thread.
///
/// When `status_filter` is `Some`, only child edges with that exact status are returned. When
/// it is `None`, all direct child edges are returned regardless of status, including statuses
/// that may be added by a future store implementation.
async fn list_thread_spawn_children(
&self,
parent_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>>;
/// List spawned descendants breadth-first by depth, then by thread id.
///
/// `status_filter` is applied to every traversed edge, not just to the returned descendants.
/// For example, `Some(Open)` walks only open edges, so descendants under a closed edge are not
/// included even if their own incoming edge is open. `None` walks and returns every persisted
/// edge regardless of status.
async fn list_thread_spawn_descendants(
&self,
root_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>>;
}

View File

@@ -0,0 +1,42 @@
use serde::Deserialize;
use serde::Serialize;
/// Lifecycle status attached to a directional thread-spawn edge.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum ThreadSpawnEdgeStatus {
/// The child thread is still live or resumable as an open spawned agent.
Open,
/// The child thread has been closed from the parent/child graph's perspective.
Closed,
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
#[test]
fn thread_spawn_edge_status_serializes_as_snake_case() {
assert_eq!(
serde_json::to_string(&ThreadSpawnEdgeStatus::Open)
.expect("open status should serialize"),
"\"open\""
);
assert_eq!(
serde_json::to_string(&ThreadSpawnEdgeStatus::Closed)
.expect("closed status should serialize"),
"\"closed\""
);
assert_eq!(
serde_json::from_str::<ThreadSpawnEdgeStatus>("\"open\"")
.expect("open status should deserialize"),
ThreadSpawnEdgeStatus::Open
);
assert_eq!(
serde_json::from_str::<ThreadSpawnEdgeStatus>("\"closed\"")
.expect("closed status should deserialize"),
ThreadSpawnEdgeStatus::Closed
);
}
}

View File

@@ -0,0 +1,6 @@
load("//:defs.bzl", "codex_rust_crate")
codex_rust_crate(
name = "agent-identity",
crate_name = "codex_agent_identity",
)

View File

@@ -0,0 +1,30 @@
[package]
edition.workspace = true
license.workspace = true
name = "codex-agent-identity"
version.workspace = true
[lib]
doctest = false
name = "codex_agent_identity"
path = "src/lib.rs"
[lints]
workspace = true
[dependencies]
anyhow = { workspace = true }
base64 = { workspace = true }
chrono = { workspace = true }
codex-protocol = { workspace = true }
crypto_box = { workspace = true }
ed25519-dalek = { workspace = true }
jsonwebtoken = { workspace = true }
rand = { workspace = true }
reqwest = { workspace = true, features = ["json"] }
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
sha2 = { workspace = true }
[dev-dependencies]
pretty_assertions = { workspace = true }

View File

@@ -0,0 +1,737 @@
use std::collections::BTreeMap;
use std::time::Duration;
use anyhow::Context;
use anyhow::Result;
use base64::Engine as _;
use base64::engine::general_purpose::STANDARD as BASE64_STANDARD;
use base64::engine::general_purpose::URL_SAFE_NO_PAD;
use chrono::SecondsFormat;
use chrono::Utc;
use codex_protocol::auth::PlanType as AuthPlanType;
use codex_protocol::protocol::SessionSource;
use crypto_box::SecretKey as Curve25519SecretKey;
use ed25519_dalek::Signer as _;
use ed25519_dalek::SigningKey;
use ed25519_dalek::VerifyingKey;
use ed25519_dalek::pkcs8::DecodePrivateKey;
use ed25519_dalek::pkcs8::EncodePrivateKey;
use jsonwebtoken::Algorithm;
use jsonwebtoken::DecodingKey;
use jsonwebtoken::Validation;
use jsonwebtoken::decode;
use jsonwebtoken::decode_header;
use jsonwebtoken::jwk::JwkSet;
use rand::TryRngCore;
use rand::rngs::OsRng;
use serde::Deserialize;
use serde::Serialize;
use serde::de::DeserializeOwned;
use sha2::Digest as _;
use sha2::Sha512;
const AGENT_TASK_REGISTRATION_TIMEOUT: Duration = Duration::from_secs(30);
const AGENT_IDENTITY_JWKS_TIMEOUT: Duration = Duration::from_secs(10);
const AGENT_IDENTITY_JWT_AUDIENCE: &str = "codex-app-server";
const AGENT_IDENTITY_JWT_ISSUER: &str = "https://chatgpt.com/codex-backend/agent-identity";
/// Stored key material for a registered agent identity.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct AgentIdentityKey<'a> {
pub agent_runtime_id: &'a str,
pub private_key_pkcs8_base64: &'a str,
}
/// Task binding to use when constructing a task-scoped AgentAssertion.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct AgentTaskAuthorizationTarget<'a> {
pub agent_runtime_id: &'a str,
pub task_id: &'a str,
}
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct AgentBillOfMaterials {
pub agent_version: String,
pub agent_harness_id: String,
pub running_location: String,
}
pub struct GeneratedAgentKeyMaterial {
pub private_key_pkcs8_base64: String,
pub public_key_ssh: String,
}
/// Claims carried by an Agent Identity JWT.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
pub struct AgentIdentityJwtClaims {
pub iss: String,
pub aud: String,
pub iat: usize,
pub exp: usize,
pub agent_runtime_id: String,
pub agent_private_key: String,
pub account_id: String,
pub chatgpt_user_id: String,
pub email: String,
pub plan_type: AuthPlanType,
pub chatgpt_account_is_fedramp: bool,
}
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
struct AgentAssertionEnvelope {
agent_runtime_id: String,
task_id: String,
timestamp: String,
signature: String,
}
#[derive(Serialize)]
struct RegisterTaskRequest {
timestamp: String,
signature: String,
}
#[derive(Deserialize)]
struct RegisterTaskResponse {
#[serde(default)]
task_id: Option<String>,
#[serde(default, rename = "taskId")]
task_id_camel: Option<String>,
#[serde(default)]
encrypted_task_id: Option<String>,
#[serde(default, rename = "encryptedTaskId")]
encrypted_task_id_camel: Option<String>,
}
pub fn authorization_header_for_agent_task(
key: AgentIdentityKey<'_>,
target: AgentTaskAuthorizationTarget<'_>,
) -> Result<String> {
anyhow::ensure!(
key.agent_runtime_id == target.agent_runtime_id,
"agent task runtime {} does not match stored agent identity {}",
target.agent_runtime_id,
key.agent_runtime_id
);
let timestamp = Utc::now().to_rfc3339_opts(SecondsFormat::Secs, true);
let envelope = AgentAssertionEnvelope {
agent_runtime_id: target.agent_runtime_id.to_string(),
task_id: target.task_id.to_string(),
timestamp: timestamp.clone(),
signature: sign_agent_assertion_payload(key, target.task_id, &timestamp)?,
};
let serialized_assertion = serialize_agent_assertion(&envelope)?;
Ok(format!("AgentAssertion {serialized_assertion}"))
}
pub async fn fetch_agent_identity_jwks(
client: &reqwest::Client,
chatgpt_base_url: &str,
) -> Result<JwkSet> {
let response = client
.get(agent_identity_jwks_url(chatgpt_base_url))
.timeout(AGENT_IDENTITY_JWKS_TIMEOUT)
.send()
.await
.context("failed to request agent identity JWKS")?
.error_for_status()
.context("agent identity JWKS endpoint returned an error")?;
response
.json()
.await
.context("failed to decode agent identity JWKS")
}
pub fn decode_agent_identity_jwt(
jwt: &str,
jwks: Option<&JwkSet>,
) -> Result<AgentIdentityJwtClaims> {
let Some(jwks) = jwks else {
return decode_agent_identity_jwt_payload(jwt);
};
let header = decode_header(jwt).context("failed to decode agent identity JWT header")?;
let kid = header
.kid
.context("agent identity JWT header does not include a kid")?;
let jwk = jwks
.find(&kid)
.with_context(|| format!("agent identity JWT kid {kid} is not trusted"))?;
let decoding_key = DecodingKey::from_jwk(jwk).context("failed to build JWT decoding key")?;
let mut validation = Validation::new(Algorithm::RS256);
validation.set_audience(&[AGENT_IDENTITY_JWT_AUDIENCE]);
validation.set_issuer(&[AGENT_IDENTITY_JWT_ISSUER]);
validation.required_spec_claims.insert("iss".to_string());
validation.required_spec_claims.insert("aud".to_string());
decode::<AgentIdentityJwtClaims>(jwt, &decoding_key, &validation)
.map(|data| data.claims)
.context("failed to verify agent identity JWT")
}
fn decode_agent_identity_jwt_payload<T: DeserializeOwned>(jwt: &str) -> Result<T> {
let mut parts = jwt.split('.');
let (_header_b64, payload_b64, _sig_b64) = match (parts.next(), parts.next(), parts.next()) {
(Some(h), Some(p), Some(s)) if !h.is_empty() && !p.is_empty() && !s.is_empty() => (h, p, s),
_ => anyhow::bail!("invalid agent identity JWT format"),
};
anyhow::ensure!(parts.next().is_none(), "invalid agent identity JWT format");
let payload_bytes = URL_SAFE_NO_PAD
.decode(payload_b64)
.context("agent identity JWT payload is not valid base64url")?;
serde_json::from_slice(&payload_bytes).context("agent identity JWT payload is not valid JSON")
}
pub fn sign_task_registration_payload(
key: AgentIdentityKey<'_>,
timestamp: &str,
) -> Result<String> {
let signing_key = signing_key_from_private_key_pkcs8_base64(key.private_key_pkcs8_base64)?;
let payload = format!("{}:{timestamp}", key.agent_runtime_id);
Ok(BASE64_STANDARD.encode(signing_key.sign(payload.as_bytes()).to_bytes()))
}
pub async fn register_agent_task(
client: &reqwest::Client,
chatgpt_base_url: &str,
key: AgentIdentityKey<'_>,
) -> Result<String> {
let timestamp = Utc::now().to_rfc3339_opts(SecondsFormat::Secs, true);
let request = RegisterTaskRequest {
signature: sign_task_registration_payload(key, &timestamp)?,
timestamp,
};
let url = agent_task_registration_url(chatgpt_base_url, key.agent_runtime_id);
let response = client
.post(url)
.timeout(AGENT_TASK_REGISTRATION_TIMEOUT)
.json(&request)
.send()
.await
.context("failed to register agent task")?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
let body = if body.len() > 512 {
format!("{}...", body.chars().take(512).collect::<String>())
} else {
body
};
anyhow::bail!("failed to register agent task with status {status}: {body}");
}
let response = response
.json()
.await
.context("failed to decode agent task registration response")?;
task_id_from_register_task_response(key, response)
}
fn task_id_from_register_task_response(
key: AgentIdentityKey<'_>,
response: RegisterTaskResponse,
) -> Result<String> {
if let Some(task_id) = response.task_id.or(response.task_id_camel) {
return Ok(task_id);
}
let encrypted_task_id = response
.encrypted_task_id
.or(response.encrypted_task_id_camel)
.context("agent task registration response omitted task id")?;
decrypt_task_id_response(key, &encrypted_task_id)
}
pub fn decrypt_task_id_response(
key: AgentIdentityKey<'_>,
encrypted_task_id: &str,
) -> Result<String> {
let signing_key = signing_key_from_private_key_pkcs8_base64(key.private_key_pkcs8_base64)?;
let ciphertext = BASE64_STANDARD
.decode(encrypted_task_id)
.context("encrypted task id is not valid base64")?;
let plaintext = curve25519_secret_key_from_signing_key(&signing_key)
.unseal(&ciphertext)
.map_err(|_| anyhow::anyhow!("failed to decrypt encrypted task id"))?;
String::from_utf8(plaintext).context("decrypted task id is not valid UTF-8")
}
pub fn generate_agent_key_material() -> Result<GeneratedAgentKeyMaterial> {
let mut secret_key_bytes = [0u8; 32];
OsRng
.try_fill_bytes(&mut secret_key_bytes)
.context("failed to generate agent identity private key bytes")?;
let signing_key = SigningKey::from_bytes(&secret_key_bytes);
let private_key_pkcs8 = signing_key
.to_pkcs8_der()
.context("failed to encode agent identity private key as PKCS#8")?;
Ok(GeneratedAgentKeyMaterial {
private_key_pkcs8_base64: BASE64_STANDARD.encode(private_key_pkcs8.as_bytes()),
public_key_ssh: encode_ssh_ed25519_public_key(&signing_key.verifying_key()),
})
}
pub fn public_key_ssh_from_private_key_pkcs8_base64(
private_key_pkcs8_base64: &str,
) -> Result<String> {
let signing_key = signing_key_from_private_key_pkcs8_base64(private_key_pkcs8_base64)?;
Ok(encode_ssh_ed25519_public_key(&signing_key.verifying_key()))
}
pub fn verifying_key_from_private_key_pkcs8_base64(
private_key_pkcs8_base64: &str,
) -> Result<VerifyingKey> {
let signing_key = signing_key_from_private_key_pkcs8_base64(private_key_pkcs8_base64)?;
Ok(signing_key.verifying_key())
}
pub fn curve25519_secret_key_from_private_key_pkcs8_base64(
private_key_pkcs8_base64: &str,
) -> Result<Curve25519SecretKey> {
let signing_key = signing_key_from_private_key_pkcs8_base64(private_key_pkcs8_base64)?;
Ok(curve25519_secret_key_from_signing_key(&signing_key))
}
pub fn agent_registration_url(chatgpt_base_url: &str) -> String {
let trimmed = chatgpt_base_url.trim_end_matches('/');
format!("{trimmed}/v1/agent/register")
}
pub fn agent_task_registration_url(chatgpt_base_url: &str, agent_runtime_id: &str) -> String {
let trimmed = chatgpt_base_url.trim_end_matches('/');
format!("{trimmed}/v1/agent/{agent_runtime_id}/task/register")
}
pub fn agent_identity_biscuit_url(chatgpt_base_url: &str) -> String {
let trimmed = chatgpt_base_url.trim_end_matches('/');
format!("{trimmed}/authenticate_app_v2")
}
pub fn agent_identity_jwks_url(chatgpt_base_url: &str) -> String {
let trimmed = chatgpt_base_url.trim_end_matches('/');
if trimmed.contains("/backend-api") {
format!("{trimmed}/wham/agent-identities/jwks")
} else {
format!("{trimmed}/agent-identities/jwks")
}
}
pub fn agent_identity_request_id() -> Result<String> {
let mut request_id_bytes = [0u8; 16];
OsRng
.try_fill_bytes(&mut request_id_bytes)
.context("failed to generate agent identity request id")?;
Ok(format!(
"codex-agent-identity-{}",
URL_SAFE_NO_PAD.encode(request_id_bytes)
))
}
pub fn build_abom(session_source: SessionSource) -> AgentBillOfMaterials {
AgentBillOfMaterials {
agent_version: env!("CARGO_PKG_VERSION").to_string(),
agent_harness_id: match &session_source {
SessionSource::VSCode => "codex-app".to_string(),
SessionSource::Cli
| SessionSource::Exec
| SessionSource::Mcp
| SessionSource::Custom(_)
| SessionSource::Internal(_)
| SessionSource::SubAgent(_)
| SessionSource::Unknown => "codex-cli".to_string(),
},
running_location: format!("{}-{}", session_source, std::env::consts::OS),
}
}
pub fn encode_ssh_ed25519_public_key(verifying_key: &VerifyingKey) -> String {
let mut blob = Vec::with_capacity(4 + 11 + 4 + 32);
append_ssh_string(&mut blob, b"ssh-ed25519");
append_ssh_string(&mut blob, verifying_key.as_bytes());
format!("ssh-ed25519 {}", BASE64_STANDARD.encode(blob))
}
fn sign_agent_assertion_payload(
key: AgentIdentityKey<'_>,
task_id: &str,
timestamp: &str,
) -> Result<String> {
let signing_key = signing_key_from_private_key_pkcs8_base64(key.private_key_pkcs8_base64)?;
let payload = format!("{}:{task_id}:{timestamp}", key.agent_runtime_id);
Ok(BASE64_STANDARD.encode(signing_key.sign(payload.as_bytes()).to_bytes()))
}
fn serialize_agent_assertion(envelope: &AgentAssertionEnvelope) -> Result<String> {
let payload = serde_json::to_vec(&BTreeMap::from([
("agent_runtime_id", envelope.agent_runtime_id.as_str()),
("signature", envelope.signature.as_str()),
("task_id", envelope.task_id.as_str()),
("timestamp", envelope.timestamp.as_str()),
]))
.context("failed to serialize agent assertion envelope")?;
Ok(URL_SAFE_NO_PAD.encode(payload))
}
fn curve25519_secret_key_from_signing_key(signing_key: &SigningKey) -> Curve25519SecretKey {
let digest = Sha512::digest(signing_key.to_bytes());
let mut secret_key = [0u8; 32];
secret_key.copy_from_slice(&digest[..32]);
secret_key[0] &= 248;
secret_key[31] &= 127;
secret_key[31] |= 64;
Curve25519SecretKey::from(secret_key)
}
fn append_ssh_string(buf: &mut Vec<u8>, value: &[u8]) {
buf.extend_from_slice(&(value.len() as u32).to_be_bytes());
buf.extend_from_slice(value);
}
fn signing_key_from_private_key_pkcs8_base64(private_key_pkcs8_base64: &str) -> Result<SigningKey> {
let private_key = BASE64_STANDARD
.decode(private_key_pkcs8_base64)
.context("stored agent identity private key is not valid base64")?;
SigningKey::from_pkcs8_der(&private_key)
.context("stored agent identity private key is not valid PKCS#8")
}
#[cfg(test)]
mod tests {
use base64::Engine as _;
use ed25519_dalek::Signature;
use ed25519_dalek::Verifier as _;
use jsonwebtoken::EncodingKey;
use jsonwebtoken::Header;
use pretty_assertions::assert_eq;
use codex_protocol::auth::KnownPlan;
use super::*;
#[test]
fn authorization_header_for_agent_task_serializes_signed_agent_assertion() {
let signing_key = SigningKey::from_bytes(&[7u8; 32]);
let private_key = signing_key
.to_pkcs8_der()
.expect("encode test key material");
let key = AgentIdentityKey {
agent_runtime_id: "agent-123",
private_key_pkcs8_base64: &BASE64_STANDARD.encode(private_key.as_bytes()),
};
let target = AgentTaskAuthorizationTarget {
agent_runtime_id: "agent-123",
task_id: "task-123",
};
let header =
authorization_header_for_agent_task(key, target).expect("build agent assertion header");
let token = header
.strip_prefix("AgentAssertion ")
.expect("agent assertion scheme");
let payload = URL_SAFE_NO_PAD
.decode(token)
.expect("valid base64url payload");
let envelope: AgentAssertionEnvelope =
serde_json::from_slice(&payload).expect("valid assertion envelope");
assert_eq!(
envelope,
AgentAssertionEnvelope {
agent_runtime_id: "agent-123".to_string(),
task_id: "task-123".to_string(),
timestamp: envelope.timestamp.clone(),
signature: envelope.signature.clone(),
}
);
let signature_bytes = BASE64_STANDARD
.decode(&envelope.signature)
.expect("valid base64 signature");
let signature = Signature::from_slice(&signature_bytes).expect("valid signature bytes");
signing_key
.verifying_key()
.verify(
format!(
"{}:{}:{}",
envelope.agent_runtime_id, envelope.task_id, envelope.timestamp
)
.as_bytes(),
&signature,
)
.expect("signature should verify");
}
#[test]
fn authorization_header_for_agent_task_rejects_mismatched_runtime() {
let signing_key = SigningKey::from_bytes(&[7u8; 32]);
let private_key = signing_key
.to_pkcs8_der()
.expect("encode test key material");
let private_key_pkcs8_base64 = BASE64_STANDARD.encode(private_key.as_bytes());
let key = AgentIdentityKey {
agent_runtime_id: "agent-123",
private_key_pkcs8_base64: &private_key_pkcs8_base64,
};
let target = AgentTaskAuthorizationTarget {
agent_runtime_id: "agent-456",
task_id: "task-123",
};
let error = authorization_header_for_agent_task(key, target)
.expect_err("runtime mismatch should fail");
assert_eq!(
error.to_string(),
"agent task runtime agent-456 does not match stored agent identity agent-123"
);
}
#[test]
fn decode_agent_identity_jwt_reads_claims() {
let jwt = jwt_with_payload(serde_json::json!({
"iss": AGENT_IDENTITY_JWT_ISSUER,
"aud": AGENT_IDENTITY_JWT_AUDIENCE,
"iat": 1_700_000_000usize,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "pro",
"chatgpt_account_is_fedramp": false,
}));
let claims = decode_agent_identity_jwt(&jwt, /*jwks*/ None).expect("JWT should decode");
assert_eq!(
claims,
AgentIdentityJwtClaims {
iss: AGENT_IDENTITY_JWT_ISSUER.to_string(),
aud: AGENT_IDENTITY_JWT_AUDIENCE.to_string(),
iat: 1_700_000_000,
exp: 4_000_000_000,
agent_runtime_id: "agent-runtime-id".to_string(),
agent_private_key: "private-key".to_string(),
account_id: "account-id".to_string(),
chatgpt_user_id: "user-id".to_string(),
email: "user@example.com".to_string(),
plan_type: AuthPlanType::Known(KnownPlan::Pro),
chatgpt_account_is_fedramp: false,
}
);
}
#[test]
fn decode_agent_identity_jwt_maps_raw_plan_aliases() {
let jwt = jwt_with_payload(serde_json::json!({
"iss": AGENT_IDENTITY_JWT_ISSUER,
"aud": AGENT_IDENTITY_JWT_AUDIENCE,
"iat": 1_700_000_000usize,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "hc",
"chatgpt_account_is_fedramp": false,
}));
let claims = decode_agent_identity_jwt(&jwt, /*jwks*/ None).expect("JWT should decode");
assert_eq!(claims.plan_type, AuthPlanType::Known(KnownPlan::Enterprise));
}
#[test]
fn decode_agent_identity_jwt_verifies_when_jwks_is_present() {
let jwks = test_jwks("test-key");
let claims = AgentIdentityJwtClaims {
iss: AGENT_IDENTITY_JWT_ISSUER.to_string(),
aud: AGENT_IDENTITY_JWT_AUDIENCE.to_string(),
iat: 1_700_000_000,
exp: 4_000_000_000,
agent_runtime_id: "agent-runtime-id".to_string(),
agent_private_key: "private-key".to_string(),
account_id: "account-id".to_string(),
chatgpt_user_id: "user-id".to_string(),
email: "user@example.com".to_string(),
plan_type: AuthPlanType::Known(KnownPlan::Pro),
chatgpt_account_is_fedramp: false,
};
let jwt = jsonwebtoken::encode(
&test_jwt_header("test-key"),
&serde_json::json!({
"iss": claims.iss,
"aud": claims.aud,
"iat": claims.iat,
"exp": claims.exp,
"agent_runtime_id": claims.agent_runtime_id,
"agent_private_key": claims.agent_private_key,
"account_id": claims.account_id,
"chatgpt_user_id": claims.chatgpt_user_id,
"email": claims.email,
"plan_type": "pro",
"chatgpt_account_is_fedramp": claims.chatgpt_account_is_fedramp,
}),
&test_rsa_encoding_key(),
)
.expect("JWT should encode");
let expected_claims = AgentIdentityJwtClaims {
iss: AGENT_IDENTITY_JWT_ISSUER.to_string(),
aud: AGENT_IDENTITY_JWT_AUDIENCE.to_string(),
iat: 1_700_000_000,
exp: 4_000_000_000,
agent_runtime_id: "agent-runtime-id".to_string(),
agent_private_key: "private-key".to_string(),
account_id: "account-id".to_string(),
chatgpt_user_id: "user-id".to_string(),
email: "user@example.com".to_string(),
plan_type: AuthPlanType::Known(KnownPlan::Pro),
chatgpt_account_is_fedramp: false,
};
assert_eq!(
decode_agent_identity_jwt(&jwt, Some(&jwks)).expect("JWT should verify"),
expected_claims
);
}
#[test]
fn decode_agent_identity_jwt_rejects_untrusted_kid() {
let jwks = test_jwks("other-key");
let jwt = jsonwebtoken::encode(
&test_jwt_header("test-key"),
&serde_json::json!({
"iss": AGENT_IDENTITY_JWT_ISSUER,
"aud": AGENT_IDENTITY_JWT_AUDIENCE,
"iat": 1_700_000_000,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "pro",
"chatgpt_account_is_fedramp": false,
}),
&test_rsa_encoding_key(),
)
.expect("JWT should encode");
decode_agent_identity_jwt(&jwt, Some(&jwks)).expect_err("JWT should not verify");
}
#[test]
fn decode_agent_identity_jwt_requires_issuer_and_audience() {
let jwks = test_jwks("test-key");
let jwt = jsonwebtoken::encode(
&test_jwt_header("test-key"),
&serde_json::json!({
"iat": 1_700_000_000,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "pro",
"chatgpt_account_is_fedramp": false,
}),
&test_rsa_encoding_key(),
)
.expect("JWT should encode");
decode_agent_identity_jwt(&jwt, Some(&jwks)).expect_err("JWT should not verify");
}
fn test_jwt_header(kid: &str) -> Header {
let mut header = Header::new(Algorithm::RS256);
header.kid = Some(kid.to_string());
header
}
fn test_rsa_encoding_key() -> EncodingKey {
EncodingKey::from_rsa_pem(
br#"-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDWpAXYypOsYAwO
bvBduMk/mxaoYDze0AZSzaSzLuIlcsl2EKDgC3AabhIWXh/qTGEJLOU3VB1e5mO9
FPbBlmIZSL3FQTbyt/hYutPFKfCou5PLmScw/TzILS3/RhT8UY9kxxZvXiEbTki9
mvxRuZFpVqDFJHwfitIjKZGhXDCYVKurPTrxetYZJg0h8sQBLKjkZ0BqqaTUkAsg
0eBgZAlXEzG3By8PGhUqYLt6W1Q3KYw0FmGy/gTyzH1g0ukGgSJvOd8SkNT8MbOs
zl5kKxDNqpuEE6UZ3jbuJ+5382d31w+rOAJRzbf7QVdI9+luCSwJcDACYPQ4WNBa
uCpV0ovpAgMBAAECggEAVu84LwZdqYN9XpswX8VoPYrjMm9IODapWQBRpQFoNyK2
1ksF3bjEPvA2Azk8U/l7k+vLKw22l6lY3EyRZPcz5GnB8xLm3ogE3mtNOp4yCyVu
RxhQ91aaN7mU17/a4BdorLi2LYVCg3zBmYociD1Q2AluNGsCmwPu+K7tfR2J0Sg8
NjqiTbDG1XDpR/icwgC9t6vh8lZpCHDhF4tbQfLLVLeA/OdcuzXDyMCXbmdVIdBQ
rm4aIFmr2e1/2ctTbCg85S6AGFTH+pSLjrwTzyvf+F6NW5uNjLQAQLFj+EznBDxj
Xdx90cySrjsKK6PVWQF4RiTvkSW8eWL7R6B2FZbGwQKBgQDuVQRj72hWloR7mbEL
aUEEv3pIXTMXWEsoMBNczos/1L1RnAN1AI44TurznasPZAWvQj+kVbLDR+TAeZrL
iA8HIWswQUI18hFmgKzSkwIXGtubcKVrgsKeS4lMDKCM/Ef6WAYdeq6ronoY5lCN
YrJFmGp81W5zcV7lyiycgbSiGwKBgQDmjWYf6pZjrK7Z+OJ3X1AZfi2vss15SCvL
3fPgzIDbViztpGyQhc3DQZIsBNIu0xZp/veGce9TEeTds2ro9NfdJFeou8+fC7Pq
sOsM3amGFFi+ZW/9BWyjZEM88bgWWAjqLHbpfHDxjAf5CSxddqxgHlbP0Ytyb1Vg
gmPDn9YKSwKBgQDbTi3hC35WFuDHn0/zcSHcDZmnFuOZeqyFyV83yfMGhGrEuqvP
sPgtRikajJ3IZsB4WZyYSidZXEFY/0z6NjOl2xF38MTNQPbT/FmK1q1Yt2UWrlv5
BvSwlk87RG9D7C0LZo4R+D7cPoDdgqjiwMvMEIkEX5zn641oI1ZTmWKuuwKBgQCD
KF+3unnRvHRAVoFnTZbA2fJdqMeRvogD04GhGlYX8V9f1hFY6nXTJaNlXVzA/J8c
r8ra9kgjJuPfZ+ljG58OFFW2DRohLcQtuHYPfK6rMzoFHqnl9EcIcMp7ijuionR3
29HOJFgQYgxLFXfit9d6WugiE+BTupiEbckZif13HwKBgE/lAlkVHP6YahOO2Ljc
J1bwkqKZTB5dHolX9A58e/xXnfZ5P8f3Z83+Izap3FwqQulk7b1WO1MQcHuVg2NN
5da0D4h2rYOXnbYIg0BVu4spQbaM6ewsp66b8+MzLOBvj8SzWdt1Oyw0q/MRyQAR
8U4M2TSWCKUY/A6sT4W8+mT9
-----END PRIVATE KEY-----"#,
)
.expect("test RSA key should parse")
}
fn test_jwks(kid: &str) -> jsonwebtoken::jwk::JwkSet {
serde_json::from_value(serde_json::json!({
"keys": [{
"kty": "RSA",
"kid": kid,
"use": "sig",
"alg": "RS256",
"n": "1qQF2MqTrGAMDm7wXbjJP5sWqGA83tAGUs2ksy7iJXLJdhCg4AtwGm4SFl4f6kxhCSzlN1QdXuZjvRT2wZZiGUi9xUE28rf4WLrTxSnwqLuTy5knMP08yC0t_0YU_FGPZMcWb14hG05IvZr8UbmRaVagxSR8H4rSIymRoVwwmFSrqz068XrWGSYNIfLEASyo5GdAaqmk1JALINHgYGQJVxMxtwcvDxoVKmC7eltUNymMNBZhsv4E8sx9YNLpBoEibznfEpDU_DGzrM5eZCsQzaqbhBOlGd427ifud_Nnd9cPqzgCUc23-0FXSPfpbgksCXAwAmD0OFjQWrgqVdKL6Q",
"e": "AQAB",
}]
}))
.expect("test JWKS should parse")
}
#[test]
fn agent_identity_jwks_url_uses_backend_api_base_url() {
assert_eq!(
agent_identity_jwks_url("https://chatgpt.com/backend-api"),
"https://chatgpt.com/backend-api/wham/agent-identities/jwks"
);
assert_eq!(
agent_identity_jwks_url("https://chatgpt.com/backend-api/"),
"https://chatgpt.com/backend-api/wham/agent-identities/jwks"
);
}
#[test]
fn agent_identity_jwks_url_uses_codex_api_base_url() {
assert_eq!(
agent_identity_jwks_url("http://localhost:8080/api/codex"),
"http://localhost:8080/api/codex/agent-identities/jwks"
);
assert_eq!(
agent_identity_jwks_url("http://localhost:8080/api/codex/"),
"http://localhost:8080/api/codex/agent-identities/jwks"
);
}
fn jwt_with_payload(payload: serde_json::Value) -> String {
let encode = |bytes: &[u8]| URL_SAFE_NO_PAD.encode(bytes);
let header_b64 = encode(br#"{"alg":"none","typ":"JWT"}"#);
let payload_b64 = encode(&serde_json::to_vec(&payload).expect("payload should serialize"));
let signature_b64 = encode(b"sig");
format!("{header_b64}.{payload_b64}.{signature_b64}")
}
}

View File

@@ -16,6 +16,7 @@ workspace = true
codex-app-server-protocol = { workspace = true }
codex-git-utils = { workspace = true }
codex-login = { workspace = true }
codex-model-provider = { workspace = true }
codex-plugin = { workspace = true }
codex-protocol = { workspace = true }
os_info = { workspace = true }

View File

@@ -59,7 +59,7 @@ use codex_app_server_protocol::ApprovalsReviewer as AppServerApprovalsReviewer;
use codex_app_server_protocol::AskForApproval as AppServerAskForApproval;
use codex_app_server_protocol::ClientInfo;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponse;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::CodexErrorInfo;
use codex_app_server_protocol::InitializeCapabilities;
use codex_app_server_protocol::InitializeParams;
@@ -70,6 +70,8 @@ use codex_app_server_protocol::SandboxPolicy as AppServerSandboxPolicy;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::SessionSource as AppServerSessionSource;
use codex_app_server_protocol::Thread;
use codex_app_server_protocol::ThreadArchiveParams;
use codex_app_server_protocol::ThreadArchiveResponse;
use codex_app_server_protocol::ThreadResumeResponse;
use codex_app_server_protocol::ThreadStartResponse;
use codex_app_server_protocol::ThreadStatus as AppServerThreadStatus;
@@ -91,6 +93,7 @@ use codex_plugin::PluginTelemetryMetadata;
use codex_protocol::approvals::NetworkApprovalProtocol;
use codex_protocol::config_types::ApprovalsReviewer;
use codex_protocol::config_types::ModeKind;
use codex_protocol::models::PermissionProfile as CorePermissionProfile;
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::HookEventName;
use codex_protocol::protocol::HookRunStatus;
@@ -139,22 +142,25 @@ fn sample_thread_with_source(
}
}
fn sample_thread_start_response(thread_id: &str, ephemeral: bool, model: &str) -> ClientResponse {
ClientResponse::ThreadStart {
request_id: RequestId::Integer(1),
response: ThreadStartResponse {
thread: sample_thread(thread_id, ephemeral),
model: model.to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
reasoning_effort: None,
},
}
fn sample_thread_start_response(
thread_id: &str,
ephemeral: bool,
model: &str,
) -> ClientResponsePayload {
ClientResponsePayload::ThreadStart(ThreadStartResponse {
thread: sample_thread(thread_id, ephemeral),
model: model.to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: None,
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_app_server_client_metadata() -> CodexAppServerClientMetadata {
@@ -176,7 +182,11 @@ fn sample_runtime_metadata() -> CodexRuntimeMetadata {
}
}
fn sample_thread_resume_response(thread_id: &str, ephemeral: bool, model: &str) -> ClientResponse {
fn sample_thread_resume_response(
thread_id: &str,
ephemeral: bool,
model: &str,
) -> ClientResponsePayload {
sample_thread_resume_response_with_source(
thread_id,
ephemeral,
@@ -190,22 +200,21 @@ fn sample_thread_resume_response_with_source(
ephemeral: bool,
model: &str,
source: AppServerSessionSource,
) -> ClientResponse {
ClientResponse::ThreadResume {
request_id: RequestId::Integer(2),
response: ThreadResumeResponse {
thread: sample_thread_with_source(thread_id, ephemeral, source),
model: model.to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
reasoning_effort: None,
},
}
) -> ClientResponsePayload {
ClientResponsePayload::ThreadResume(ThreadResumeResponse {
thread: sample_thread_with_source(thread_id, ephemeral, source),
model: model.to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: None,
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_turn_start_request(thread_id: &str, request_id: i64) -> ClientRequest {
@@ -227,21 +236,18 @@ fn sample_turn_start_request(thread_id: &str, request_id: i64) -> ClientRequest
}
}
fn sample_turn_start_response(turn_id: &str, request_id: i64) -> ClientResponse {
ClientResponse::TurnStart {
request_id: RequestId::Integer(request_id),
response: codex_app_server_protocol::TurnStartResponse {
turn: Turn {
id: turn_id.to_string(),
items: vec![],
status: AppServerTurnStatus::InProgress,
error: None,
started_at: None,
completed_at: None,
duration_ms: None,
},
fn sample_turn_start_response(turn_id: &str) -> ClientResponsePayload {
ClientResponsePayload::TurnStart(codex_app_server_protocol::TurnStartResponse {
turn: Turn {
id: turn_id.to_string(),
items: vec![],
status: AppServerTurnStatus::InProgress,
error: None,
started_at: None,
completed_at: None,
duration_ms: None,
},
}
})
}
fn sample_turn_started_notification(thread_id: &str, turn_id: &str) -> ServerNotification {
@@ -297,22 +303,25 @@ fn sample_turn_completed_notification(
})
}
fn sample_turn_resolved_config(turn_id: &str) -> TurnResolvedConfigFact {
fn sample_turn_resolved_config(thread_id: &str, turn_id: &str) -> TurnResolvedConfigFact {
TurnResolvedConfigFact {
turn_id: turn_id.to_string(),
thread_id: "thread-2".to_string(),
thread_id: thread_id.to_string(),
num_input_images: 1,
submission_type: None,
ephemeral: false,
session_source: SessionSource::Exec,
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
sandbox_policy: SandboxPolicy::new_read_only_policy(),
permission_profile: CorePermissionProfile::from_legacy_sandbox_policy(
&SandboxPolicy::new_read_only_policy(),
),
permission_profile_cwd: PathBuf::from("/tmp"),
reasoning_effort: None,
reasoning_summary: None,
service_tier: None,
approval_policy: AskForApproval::OnRequest,
approvals_reviewer: ApprovalsReviewer::GuardianSubagent,
approvals_reviewer: ApprovalsReviewer::AutoReview,
sandbox_network_access: true,
collaboration_mode: ModeKind::Plan,
personality: None,
@@ -344,13 +353,10 @@ fn sample_turn_steer_request(
}
}
fn sample_turn_steer_response(turn_id: &str, request_id: i64) -> ClientResponse {
ClientResponse::TurnSteer {
request_id: RequestId::Integer(request_id),
response: TurnSteerResponse {
turn_id: turn_id.to_string(),
},
}
fn sample_turn_steer_response(turn_id: &str) -> ClientResponsePayload {
ClientResponsePayload::TurnSteer(TurnSteerResponse {
turn_id: turn_id.to_string(),
})
}
fn no_active_turn_steer_error() -> JSONRPCErrorError {
@@ -415,7 +421,39 @@ async fn ingest_rejected_turn_steer(
.await;
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::Initialize {
connection_id: 8,
params: InitializeParams {
client_info: ClientInfo {
name: "codex-web".to_string(),
title: None,
version: "1.0.0".to_string(),
},
capabilities: None,
},
product_client_id: "codex-web".to_string(),
runtime: sample_runtime_metadata(),
rpc_transport: AppServerRpcTransport::Stdio,
},
out,
)
.await;
reducer
.ingest(
AnalyticsFact::ClientResponse {
connection_id: 8,
request_id: RequestId::Integer(6),
response: Box::new(sample_thread_resume_response(
"thread-2", /*ephemeral*/ false, "gpt-5",
)),
},
out,
)
.await;
out.clear();
reducer
.ingest(
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(4),
request: Box::new(sample_turn_steer_request(
@@ -475,8 +513,9 @@ async fn ingest_turn_prerequisites(
ingest_initialize(reducer, out).await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(1),
response: Box::new(sample_thread_start_response(
"thread-2", /*ephemeral*/ false, "gpt-5",
)),
@@ -489,7 +528,7 @@ async fn ingest_turn_prerequisites(
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(3),
request: Box::new(sample_turn_start_request("thread-2", /*request_id*/ 3)),
@@ -499,9 +538,10 @@ async fn ingest_turn_prerequisites(
.await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
response: Box::new(sample_turn_start_response("turn-2", /*request_id*/ 3)),
request_id: RequestId::Integer(3),
response: Box::new(sample_turn_start_response("turn-2")),
},
out,
)
@@ -511,7 +551,7 @@ async fn ingest_turn_prerequisites(
reducer
.ingest(
AnalyticsFact::Custom(CustomAnalyticsFact::TurnResolvedConfig(Box::new(
sample_turn_resolved_config("turn-2"),
sample_turn_resolved_config("thread-2", "turn-2"),
))),
out,
)
@@ -851,8 +891,9 @@ async fn initialize_caches_client_and_thread_lifecycle_publishes_once_initialize
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(1),
response: Box::new(sample_thread_start_response(
"thread-no-client",
/*ephemeral*/ false,
@@ -895,8 +936,9 @@ async fn initialize_caches_client_and_thread_lifecycle_publishes_once_initialize
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(2),
response: Box::new(sample_thread_resume_response(
"thread-1", /*ephemeral*/ true, "gpt-5",
)),
@@ -943,6 +985,65 @@ async fn initialize_caches_client_and_thread_lifecycle_publishes_once_initialize
);
}
#[tokio::test]
async fn unrelated_client_requests_are_ignored_by_reducer() {
let mut reducer = AnalyticsReducer::default();
let mut events = Vec::new();
reducer
.ingest(
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(3),
request: Box::new(ClientRequest::ThreadArchive {
request_id: RequestId::Integer(3),
params: ThreadArchiveParams {
thread_id: "thread-2".to_string(),
},
}),
},
&mut events,
)
.await;
reducer
.ingest(
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(3),
response: Box::new(sample_turn_start_response("turn-2")),
},
&mut events,
)
.await;
assert!(
events.is_empty(),
"unrelated requests must not create pending turn state"
);
}
#[tokio::test]
async fn unrelated_client_responses_are_ignored_by_reducer() {
let mut reducer = AnalyticsReducer::default();
let mut events = Vec::new();
ingest_initialize(&mut reducer, &mut events).await;
reducer
.ingest(
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(9),
response: Box::new(ClientResponsePayload::ThreadArchive(
ThreadArchiveResponse {},
)),
},
&mut events,
)
.await;
assert!(events.is_empty());
}
#[tokio::test]
async fn compaction_event_ingests_custom_fact() {
let mut reducer = AnalyticsReducer::default();
@@ -975,8 +1076,9 @@ async fn compaction_event_ingests_custom_fact() {
.await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(2),
response: Box::new(sample_thread_resume_response_with_source(
"thread-1",
/*ephemeral*/ false,
@@ -1086,8 +1188,9 @@ async fn guardian_review_event_ingests_custom_fact_with_optional_target_item() {
.await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(1),
response: Box::new(sample_thread_start_response(
"thread-guardian",
/*ephemeral*/ false,
@@ -1322,7 +1425,7 @@ fn subagent_thread_started_other_serializes_explicit_parent_thread_id() {
},
));
let payload = serde_json::to_value(&event).expect("serialize guardian subagent event");
let payload = serde_json::to_value(&event).expect("serialize auto-review subagent event");
assert_eq!(payload["event_params"]["subagent_source"], "guardian");
assert_eq!(
payload["event_params"]["parent_thread_id"],
@@ -1365,6 +1468,110 @@ async fn subagent_thread_started_publishes_without_initialize() {
assert_eq!(payload[0]["event_params"]["subagent_source"], "review");
}
#[tokio::test]
async fn subagent_thread_started_inherits_parent_connection_for_new_thread() {
let mut reducer = AnalyticsReducer::default();
let mut events = Vec::new();
let parent_thread_id =
codex_protocol::ThreadId::from_string("44444444-4444-4444-4444-444444444444")
.expect("valid parent thread id");
let parent_thread_id_string = parent_thread_id.to_string();
reducer
.ingest(
AnalyticsFact::Initialize {
connection_id: 7,
params: InitializeParams {
client_info: ClientInfo {
name: "parent-client".to_string(),
title: None,
version: "1.0.0".to_string(),
},
capabilities: None,
},
product_client_id: "parent-client".to_string(),
runtime: sample_runtime_metadata(),
rpc_transport: AppServerRpcTransport::Stdio,
},
&mut events,
)
.await;
reducer
.ingest(
AnalyticsFact::ClientResponse {
connection_id: 7,
request_id: RequestId::Integer(1),
response: Box::new(sample_thread_start_response(
&parent_thread_id_string,
/*ephemeral*/ false,
"gpt-5",
)),
},
&mut events,
)
.await;
reducer
.ingest(
AnalyticsFact::Custom(CustomAnalyticsFact::SubAgentThreadStarted(
SubAgentThreadStartedInput {
thread_id: "thread-review".to_string(),
parent_thread_id: None,
product_client_id: "parent-client".to_string(),
client_name: "parent-client".to_string(),
client_version: "1.0.0".to_string(),
model: "gpt-5".to_string(),
ephemeral: false,
subagent_source: SubAgentSource::ThreadSpawn {
parent_thread_id,
depth: 1,
agent_path: None,
agent_nickname: None,
agent_role: None,
},
created_at: 130,
},
)),
&mut events,
)
.await;
events.clear();
reducer
.ingest(
AnalyticsFact::Custom(CustomAnalyticsFact::Compaction(Box::new(
CodexCompactionEvent {
thread_id: "thread-review".to_string(),
turn_id: "turn-compact".to_string(),
trigger: CompactionTrigger::Manual,
reason: CompactionReason::UserRequested,
implementation: CompactionImplementation::Responses,
phase: CompactionPhase::StandaloneTurn,
strategy: CompactionStrategy::Memento,
status: CompactionStatus::Completed,
error: None,
active_context_tokens_before: 131_000,
active_context_tokens_after: 64_000,
started_at: 100,
completed_at: 101,
duration_ms: Some(1200),
},
))),
&mut events,
)
.await;
let payload = serde_json::to_value(&events).expect("serialize events");
assert_eq!(
payload[0]["event_params"]["app_server_client"]["product_client_id"],
"parent-client"
);
assert_eq!(
payload[0]["event_params"]["parent_thread_id"],
"44444444-4444-4444-4444-444444444444"
);
}
#[test]
fn plugin_used_event_serializes_expected_shape() {
let tracking = TrackEventsContext {
@@ -1425,6 +1632,25 @@ fn plugin_management_event_serializes_expected_shape() {
);
}
#[test]
fn plugin_management_event_can_use_remote_plugin_id_override() {
let mut plugin = sample_plugin_metadata();
plugin.remote_plugin_id = Some("plugins~Plugin_remote".to_string());
let event = TrackEventRequest::PluginInstalled(CodexPluginEventRequest {
event_type: "codex_plugin_installed",
event_params: codex_plugin_metadata(plugin),
});
let payload = serde_json::to_value(&event).expect("serialize plugin installed event");
assert_eq!(
payload["event_params"]["plugin_id"],
"plugins~Plugin_remote"
);
assert_eq!(payload["event_params"]["plugin_name"], "sample");
assert_eq!(payload["event_params"]["marketplace_name"], "test");
}
#[test]
fn hook_run_event_serializes_expected_shape() {
let tracking = TrackEventsContext {
@@ -1488,6 +1714,15 @@ fn hook_run_metadata_maps_sources_and_statuses() {
},
))
.expect("serialize project hook");
let cloud_requirements = serde_json::to_value(codex_hook_run_metadata(
&tracking,
HookRunFact {
event_name: HookEventName::Stop,
hook_source: HookSource::CloudRequirements,
status: HookRunStatus::Blocked,
},
))
.expect("serialize cloud requirements hook");
let unknown = serde_json::to_value(codex_hook_run_metadata(
&tracking,
HookRunFact {
@@ -1502,6 +1737,8 @@ fn hook_run_metadata_maps_sources_and_statuses() {
assert_eq!(system["status"], "completed");
assert_eq!(project["hook_source"], "project");
assert_eq!(project["status"], "blocked");
assert_eq!(cloud_requirements["hook_source"], "cloud_requirements");
assert_eq!(cloud_requirements["status"], "blocked");
assert_eq!(unknown["hook_source"], "unknown");
assert_eq!(unknown["status"], "failed");
}
@@ -1746,7 +1983,7 @@ fn turn_event_serializes_expected_shape() {
reasoning_summary: Some("detailed".to_string()),
service_tier: "flex".to_string(),
approval_policy: "on-request".to_string(),
approvals_reviewer: "guardian_subagent".to_string(),
approvals_reviewer: "auto_review".to_string(),
sandbox_network_access: true,
collaboration_mode: Some("plan"),
personality: Some("pragmatic".to_string()),
@@ -1807,7 +2044,7 @@ fn turn_event_serializes_expected_shape() {
"reasoning_summary": "detailed",
"service_tier": "flex",
"approval_policy": "on-request",
"approvals_reviewer": "guardian_subagent",
"approvals_reviewer": "auto_review",
"sandbox_network_access": true,
"collaboration_mode": "plan",
"personality": "pragmatic",
@@ -1856,7 +2093,7 @@ async fn accepted_turn_steer_emits_expected_event() {
.await;
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(4),
request: Box::new(sample_turn_steer_request(
@@ -1868,9 +2105,10 @@ async fn accepted_turn_steer_emits_expected_event() {
.await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
response: Box::new(sample_turn_steer_response("turn-2", /*request_id*/ 4)),
request_id: RequestId::Integer(4),
response: Box::new(sample_turn_steer_response("turn-2")),
},
&mut out,
)
@@ -2010,7 +2248,7 @@ async fn turn_start_error_response_discards_pending_start_request() {
ingest_initialize(&mut reducer, &mut out).await;
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(3),
request: Box::new(sample_turn_start_request("thread-2", /*request_id*/ 3)),
@@ -2034,9 +2272,10 @@ async fn turn_start_error_response_discards_pending_start_request() {
// failed turn/start request and attach request-scoped connection metadata.
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
response: Box::new(sample_turn_start_response("turn-2", /*request_id*/ 3)),
request_id: RequestId::Integer(3),
response: Box::new(sample_turn_start_response("turn-2")),
},
&mut out,
)
@@ -2046,7 +2285,7 @@ async fn turn_start_error_response_discards_pending_start_request() {
reducer
.ingest(
AnalyticsFact::Custom(CustomAnalyticsFact::TurnResolvedConfig(Box::new(
sample_turn_resolved_config("turn-2"),
sample_turn_resolved_config("thread-2", "turn-2"),
))),
&mut out,
)
@@ -2151,7 +2390,7 @@ async fn accepted_steers_increment_turn_steer_count() {
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(4),
request: Box::new(sample_turn_steer_request(
@@ -2163,9 +2402,10 @@ async fn accepted_steers_increment_turn_steer_count() {
.await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
response: Box::new(sample_turn_steer_response("turn-2", /*request_id*/ 4)),
request_id: RequestId::Integer(4),
response: Box::new(sample_turn_steer_response("turn-2")),
},
&mut out,
)
@@ -2173,7 +2413,7 @@ async fn accepted_steers_increment_turn_steer_count() {
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(5),
request: Box::new(sample_turn_steer_request(
@@ -2197,7 +2437,7 @@ async fn accepted_steers_increment_turn_steer_count() {
reducer
.ingest(
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id: 7,
request_id: RequestId::Integer(6),
request: Box::new(sample_turn_steer_request(
@@ -2209,9 +2449,10 @@ async fn accepted_steers_increment_turn_steer_count() {
.await;
reducer
.ingest(
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id: 7,
response: Box::new(sample_turn_steer_response("turn-2", /*request_id*/ 6)),
request_id: RequestId::Integer(6),
response: Box::new(sample_turn_steer_response("turn-2")),
},
&mut out,
)
@@ -2396,6 +2637,7 @@ async fn turn_completed_without_started_notification_emits_null_started_at() {
fn sample_plugin_metadata() -> PluginTelemetryMetadata {
PluginTelemetryMetadata {
plugin_id: PluginId::parse("sample@test").expect("valid plugin id"),
remote_plugin_id: None,
capability_summary: Some(PluginCapabilitySummary {
config_name: "sample@test".to_string(),
display_name: "sample".to_string(),

View File

@@ -1,5 +1,6 @@
use crate::events::AppServerRpcTransport;
use crate::events::GuardianReviewEventParams;
use crate::events::GuardianReviewAnalyticsResult;
use crate::events::GuardianReviewTrackContext;
use crate::events::TrackEventRequest;
use crate::events::TrackEventsRequest;
use crate::events::current_runtime_metadata;
@@ -21,11 +22,13 @@ use crate::facts::TurnResolvedConfigFact;
use crate::facts::TurnTokenUsageFact;
use crate::reducer::AnalyticsReducer;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponse;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::InitializeParams;
use codex_app_server_protocol::JSONRPCErrorError;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ServerRequest;
use codex_app_server_protocol::ServerResponse;
use codex_login::AuthManager;
use codex_login::default_client::create_client;
use codex_plugin::PluginTelemetryMetadata;
@@ -48,8 +51,7 @@ pub(crate) struct AnalyticsEventsQueue {
#[derive(Clone)]
pub struct AnalyticsEventsClient {
queue: AnalyticsEventsQueue,
analytics_enabled: Option<bool>,
queue: Option<AnalyticsEventsQueue>,
}
impl AnalyticsEventsQueue {
@@ -118,11 +120,15 @@ impl AnalyticsEventsClient {
analytics_enabled: Option<bool>,
) -> Self {
Self {
queue: AnalyticsEventsQueue::new(Arc::clone(&auth_manager), base_url),
analytics_enabled,
queue: (analytics_enabled != Some(false))
.then(|| AnalyticsEventsQueue::new(Arc::clone(&auth_manager), base_url)),
}
}
pub fn disabled() -> Self {
Self { queue: None }
}
pub fn track_skill_invocations(
&self,
tracking: TrackEventsContext,
@@ -161,9 +167,13 @@ impl AnalyticsEventsClient {
));
}
pub fn track_guardian_review(&self, input: GuardianReviewEventParams) {
pub fn track_guardian_review(
&self,
tracking: &GuardianReviewTrackContext,
result: GuardianReviewAnalyticsResult,
) {
self.record_fact(AnalyticsFact::Custom(CustomAnalyticsFact::GuardianReview(
Box::new(input),
Box::new(tracking.event_params(result)),
)));
}
@@ -176,16 +186,30 @@ impl AnalyticsEventsClient {
)));
}
pub fn track_request(&self, connection_id: u64, request_id: RequestId, request: ClientRequest) {
self.record_fact(AnalyticsFact::Request {
pub fn track_request(
&self,
connection_id: u64,
request_id: RequestId,
request: &ClientRequest,
) {
if !matches!(
request,
ClientRequest::TurnStart { .. } | ClientRequest::TurnSteer { .. }
) {
return;
}
self.record_fact(AnalyticsFact::ClientRequest {
connection_id,
request_id,
request: Box::new(request),
request: Box::new(request.clone()),
});
}
pub fn track_app_used(&self, tracking: TrackEventsContext, app: AppInvocation) {
if !self.queue.should_enqueue_app_used(&tracking, &app) {
let Some(queue) = self.queue.as_ref() else {
return;
};
if !queue.should_enqueue_app_used(&tracking, &app) {
return;
}
self.record_fact(AnalyticsFact::Custom(CustomAnalyticsFact::AppUsed(
@@ -200,7 +224,10 @@ impl AnalyticsEventsClient {
}
pub fn track_plugin_used(&self, tracking: TrackEventsContext, plugin: PluginTelemetryMetadata) {
if !self.queue.should_enqueue_plugin_used(&tracking, &plugin) {
let Some(queue) = self.queue.as_ref() else {
return;
};
if !queue.should_enqueue_plugin_used(&tracking, &plugin) {
return;
}
self.record_fact(AnalyticsFact::Custom(CustomAnalyticsFact::PluginUsed(
@@ -263,15 +290,30 @@ impl AnalyticsEventsClient {
}
pub(crate) fn record_fact(&self, input: AnalyticsFact) {
if self.analytics_enabled == Some(false) {
return;
if let Some(queue) = self.queue.as_ref() {
queue.try_send(input);
}
self.queue.try_send(input);
}
pub fn track_response(&self, connection_id: u64, response: ClientResponse) {
self.record_fact(AnalyticsFact::Response {
pub fn track_response(
&self,
connection_id: u64,
request_id: RequestId,
response: ClientResponsePayload,
) {
if !matches!(
response,
ClientResponsePayload::ThreadStart(_)
| ClientResponsePayload::ThreadResume(_)
| ClientResponsePayload::ThreadFork(_)
| ClientResponsePayload::TurnStart(_)
| ClientResponsePayload::TurnSteer(_)
) {
return;
}
self.record_fact(AnalyticsFact::ClientResponse {
connection_id,
request_id,
response: Box::new(response),
});
}
@@ -294,10 +336,23 @@ impl AnalyticsEventsClient {
pub fn track_notification(&self, notification: ServerNotification) {
self.record_fact(AnalyticsFact::Notification(Box::new(notification)));
}
pub fn track_server_request(&self, connection_id: u64, request: ServerRequest) {
self.record_fact(AnalyticsFact::ServerRequest {
connection_id,
request: Box::new(request),
});
}
pub fn track_server_response(&self, response: ServerResponse) {
self.record_fact(AnalyticsFact::ServerResponse {
response: Box::new(response),
});
}
}
async fn send_track_events(
auth_manager: &Arc<AuthManager>,
auth_manager: &AuthManager,
base_url: &str,
events: Vec<TrackEventRequest>,
) {
@@ -307,34 +362,22 @@ async fn send_track_events(
let Some(auth) = auth_manager.auth().await else {
return;
};
if !auth.is_chatgpt_auth() {
if !auth.uses_codex_backend() {
return;
}
let Some(authorization_header_value) = auth_manager
.chatgpt_authorization_header_for_auth(&auth)
.await
else {
return;
};
let Some(account_id) = auth.get_account_id() else {
return;
};
let base_url = base_url.trim_end_matches('/');
let url = format!("{base_url}/codex/analytics-events/events");
let payload = TrackEventsRequest { events };
let mut request = create_client()
let response = create_client()
.post(&url)
.timeout(ANALYTICS_EVENTS_TIMEOUT)
.header("authorization", authorization_header_value)
.header("chatgpt-account-id", &account_id)
.headers(codex_model_provider::auth_provider_from_auth(&auth).to_auth_headers())
.header("Content-Type", "application/json")
.json(&payload);
if auth.is_fedramp_account() {
request = request.header("X-OpenAI-Fedramp", "true");
}
let response = request.send().await;
.json(&payload)
.send()
.await;
match response {
Ok(response) if response.status().is_success() => {}
@@ -348,3 +391,7 @@ async fn send_track_events(
}
}
}
#[cfg(test)]
#[path = "client_tests.rs"]
mod tests;

View File

@@ -0,0 +1,221 @@
use super::AnalyticsEventsClient;
use super::AnalyticsEventsQueue;
use crate::facts::AnalyticsFact;
use codex_app_server_protocol::ApprovalsReviewer as AppServerApprovalsReviewer;
use codex_app_server_protocol::AskForApproval as AppServerAskForApproval;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::PermissionProfile as AppServerPermissionProfile;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::SandboxPolicy as AppServerSandboxPolicy;
use codex_app_server_protocol::SessionSource as AppServerSessionSource;
use codex_app_server_protocol::Thread;
use codex_app_server_protocol::ThreadArchiveParams;
use codex_app_server_protocol::ThreadArchiveResponse;
use codex_app_server_protocol::ThreadForkResponse;
use codex_app_server_protocol::ThreadResumeResponse;
use codex_app_server_protocol::ThreadStartResponse;
use codex_app_server_protocol::ThreadStatus as AppServerThreadStatus;
use codex_app_server_protocol::Turn;
use codex_app_server_protocol::TurnStartParams;
use codex_app_server_protocol::TurnStartResponse;
use codex_app_server_protocol::TurnStatus as AppServerTurnStatus;
use codex_app_server_protocol::TurnSteerParams;
use codex_app_server_protocol::TurnSteerResponse;
use codex_protocol::models::PermissionProfile as CorePermissionProfile;
use codex_utils_absolute_path::test_support::PathBufExt;
use codex_utils_absolute_path::test_support::test_path_buf;
use std::collections::HashSet;
use std::sync::Arc;
use std::sync::Mutex;
use tokio::sync::mpsc;
use tokio::sync::mpsc::error::TryRecvError;
fn client_with_receiver() -> (AnalyticsEventsClient, mpsc::Receiver<AnalyticsFact>) {
let (sender, receiver) = mpsc::channel(8);
let queue = AnalyticsEventsQueue {
sender,
app_used_emitted_keys: Arc::new(Mutex::new(HashSet::new())),
plugin_used_emitted_keys: Arc::new(Mutex::new(HashSet::new())),
};
(AnalyticsEventsClient { queue: Some(queue) }, receiver)
}
fn sample_turn_start_request() -> ClientRequest {
ClientRequest::TurnStart {
request_id: RequestId::Integer(1),
params: TurnStartParams {
thread_id: "thread-1".to_string(),
input: Vec::new(),
..Default::default()
},
}
}
fn sample_turn_steer_request() -> ClientRequest {
ClientRequest::TurnSteer {
request_id: RequestId::Integer(2),
params: TurnSteerParams {
thread_id: "thread-1".to_string(),
expected_turn_id: "turn-1".to_string(),
input: Vec::new(),
responsesapi_client_metadata: None,
},
}
}
fn sample_thread_archive_request() -> ClientRequest {
ClientRequest::ThreadArchive {
request_id: RequestId::Integer(3),
params: ThreadArchiveParams {
thread_id: "thread-1".to_string(),
},
}
}
fn sample_thread(thread_id: &str) -> Thread {
Thread {
id: thread_id.to_string(),
forked_from_id: None,
preview: "first prompt".to_string(),
ephemeral: false,
model_provider: "openai".to_string(),
created_at: 1,
updated_at: 2,
status: AppServerThreadStatus::Idle,
path: None,
cwd: test_path_buf("/tmp").abs(),
cli_version: "0.0.0".to_string(),
source: AppServerSessionSource::Exec,
agent_nickname: None,
agent_role: None,
git_info: None,
name: None,
turns: Vec::new(),
}
}
fn sample_permission_profile() -> AppServerPermissionProfile {
CorePermissionProfile::Disabled.into()
}
fn sample_thread_start_response() -> ClientResponsePayload {
ClientResponsePayload::ThreadStart(ThreadStartResponse {
thread: sample_thread("thread-1"),
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: Some(sample_permission_profile()),
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_thread_resume_response() -> ClientResponsePayload {
ClientResponsePayload::ThreadResume(ThreadResumeResponse {
thread: sample_thread("thread-2"),
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: Some(sample_permission_profile()),
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_thread_fork_response() -> ClientResponsePayload {
ClientResponsePayload::ThreadFork(ThreadForkResponse {
thread: sample_thread("thread-3"),
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: Some(sample_permission_profile()),
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_turn_start_response() -> ClientResponsePayload {
ClientResponsePayload::TurnStart(TurnStartResponse {
turn: Turn {
id: "turn-1".to_string(),
items: Vec::new(),
status: AppServerTurnStatus::InProgress,
error: None,
started_at: None,
completed_at: None,
duration_ms: None,
},
})
}
fn sample_turn_steer_response() -> ClientResponsePayload {
ClientResponsePayload::TurnSteer(TurnSteerResponse {
turn_id: "turn-2".to_string(),
})
}
#[test]
fn track_request_only_enqueues_analytics_relevant_requests() {
let (client, mut receiver) = client_with_receiver();
for (request_id, request) in [
(RequestId::Integer(1), sample_turn_start_request()),
(RequestId::Integer(2), sample_turn_steer_request()),
] {
client.track_request(/*connection_id*/ 7, request_id, &request);
assert!(matches!(
receiver.try_recv(),
Ok(AnalyticsFact::ClientRequest { .. })
));
}
let ignored_request = sample_thread_archive_request();
client.track_request(
/*connection_id*/ 7,
RequestId::Integer(3),
&ignored_request,
);
assert!(matches!(receiver.try_recv(), Err(TryRecvError::Empty)));
}
#[test]
fn track_response_only_enqueues_analytics_relevant_responses() {
let (client, mut receiver) = client_with_receiver();
for (request_id, response) in [
(RequestId::Integer(1), sample_thread_start_response()),
(RequestId::Integer(2), sample_thread_resume_response()),
(RequestId::Integer(3), sample_thread_fork_response()),
(RequestId::Integer(4), sample_turn_start_response()),
(RequestId::Integer(5), sample_turn_steer_response()),
] {
client.track_response(/*connection_id*/ 7, request_id, response);
assert!(matches!(
receiver.try_recv(),
Ok(AnalyticsFact::ClientResponse { .. })
));
}
client.track_response(
/*connection_id*/ 7,
RequestId::Integer(6),
ClientResponsePayload::ThreadArchive(ThreadArchiveResponse {}),
);
assert!(matches!(receiver.try_recv(), Err(TryRecvError::Empty)));
}

View File

@@ -1,3 +1,5 @@
use std::time::Instant;
use crate::facts::AppInvocation;
use crate::facts::CodexCompactionEvent;
use crate::facts::CompactionImplementation;
@@ -16,11 +18,12 @@ use crate::facts::TurnStatus;
use crate::facts::TurnSteerRejectionReason;
use crate::facts::TurnSteerResult;
use crate::facts::TurnSubmissionType;
use crate::now_unix_seconds;
use codex_app_server_protocol::CodexErrorInfo;
use codex_login::default_client::originator;
use codex_plugin::PluginTelemetryMetadata;
use codex_protocol::approvals::NetworkApprovalProtocol;
use codex_protocol::models::PermissionProfile;
use codex_protocol::models::AdditionalPermissionProfile;
use codex_protocol::models::SandboxPermissions;
use codex_protocol::protocol::GuardianAssessmentOutcome;
use codex_protocol::protocol::GuardianCommandSource;
@@ -30,6 +33,7 @@ use codex_protocol::protocol::HookEventName;
use codex_protocol::protocol::HookRunStatus;
use codex_protocol::protocol::HookSource;
use codex_protocol::protocol::SubAgentSource;
use codex_protocol::protocol::TokenUsage;
use serde::Serialize;
#[derive(Clone, Copy, Debug, Serialize)]
@@ -176,17 +180,17 @@ pub enum GuardianApprovalRequestSource {
pub enum GuardianReviewedAction {
Shell {
sandbox_permissions: SandboxPermissions,
additional_permissions: Option<PermissionProfile>,
additional_permissions: Option<AdditionalPermissionProfile>,
},
UnifiedExec {
sandbox_permissions: SandboxPermissions,
additional_permissions: Option<PermissionProfile>,
additional_permissions: Option<AdditionalPermissionProfile>,
tty: bool,
},
Execve {
source: GuardianCommandSource,
program: String,
additional_permissions: Option<PermissionProfile>,
additional_permissions: Option<AdditionalPermissionProfile>,
},
ApplyPatch {},
NetworkAccess {
@@ -200,6 +204,7 @@ pub enum GuardianReviewedAction {
connector_name: Option<String>,
tool_title: Option<String>,
},
RequestPermissions {},
}
#[derive(Clone, Serialize)]
@@ -235,6 +240,142 @@ pub struct GuardianReviewEventParams {
pub total_tokens: Option<i64>,
}
pub struct GuardianReviewTrackContext {
thread_id: String,
turn_id: String,
review_id: String,
target_item_id: Option<String>,
approval_request_source: GuardianApprovalRequestSource,
reviewed_action: GuardianReviewedAction,
review_timeout_ms: u64,
started_at: u64,
started_instant: Instant,
}
impl GuardianReviewTrackContext {
pub fn new(
thread_id: String,
turn_id: String,
review_id: String,
target_item_id: Option<String>,
approval_request_source: GuardianApprovalRequestSource,
reviewed_action: GuardianReviewedAction,
review_timeout_ms: u64,
) -> Self {
Self {
thread_id,
turn_id,
review_id,
target_item_id,
approval_request_source,
reviewed_action,
review_timeout_ms,
started_at: now_unix_seconds(),
started_instant: Instant::now(),
}
}
pub(crate) fn event_params(
&self,
result: GuardianReviewAnalyticsResult,
) -> GuardianReviewEventParams {
GuardianReviewEventParams {
thread_id: self.thread_id.clone(),
turn_id: self.turn_id.clone(),
review_id: self.review_id.clone(),
target_item_id: self.target_item_id.clone(),
approval_request_source: self.approval_request_source,
reviewed_action: self.reviewed_action.clone(),
reviewed_action_truncated: result.reviewed_action_truncated,
decision: result.decision,
terminal_status: result.terminal_status,
failure_reason: result.failure_reason,
risk_level: result.risk_level,
user_authorization: result.user_authorization,
outcome: result.outcome,
guardian_thread_id: result.guardian_thread_id,
guardian_session_kind: result.guardian_session_kind,
guardian_model: result.guardian_model,
guardian_reasoning_effort: result.guardian_reasoning_effort,
had_prior_review_context: result.had_prior_review_context,
review_timeout_ms: self.review_timeout_ms,
// TODO(rhan-oai): plumb nested Guardian review session tool-call counts.
tool_call_count: None,
time_to_first_token_ms: result.time_to_first_token_ms,
completion_latency_ms: Some(self.started_instant.elapsed().as_millis() as u64),
started_at: self.started_at,
completed_at: Some(now_unix_seconds()),
input_tokens: result.token_usage.as_ref().map(|usage| usage.input_tokens),
cached_input_tokens: result
.token_usage
.as_ref()
.map(|usage| usage.cached_input_tokens),
output_tokens: result.token_usage.as_ref().map(|usage| usage.output_tokens),
reasoning_output_tokens: result
.token_usage
.as_ref()
.map(|usage| usage.reasoning_output_tokens),
total_tokens: result.token_usage.as_ref().map(|usage| usage.total_tokens),
}
}
}
#[derive(Debug)]
pub struct GuardianReviewAnalyticsResult {
pub decision: GuardianReviewDecision,
pub terminal_status: GuardianReviewTerminalStatus,
pub failure_reason: Option<GuardianReviewFailureReason>,
pub risk_level: Option<GuardianRiskLevel>,
pub user_authorization: Option<GuardianUserAuthorization>,
pub outcome: Option<GuardianAssessmentOutcome>,
pub guardian_thread_id: Option<String>,
pub guardian_session_kind: Option<GuardianReviewSessionKind>,
pub guardian_model: Option<String>,
pub guardian_reasoning_effort: Option<String>,
pub had_prior_review_context: Option<bool>,
pub reviewed_action_truncated: bool,
pub token_usage: Option<TokenUsage>,
pub time_to_first_token_ms: Option<u64>,
}
impl GuardianReviewAnalyticsResult {
pub fn without_session() -> Self {
Self {
decision: GuardianReviewDecision::Denied,
terminal_status: GuardianReviewTerminalStatus::FailedClosed,
failure_reason: None,
risk_level: None,
user_authorization: None,
outcome: None,
guardian_thread_id: None,
guardian_session_kind: None,
guardian_model: None,
guardian_reasoning_effort: None,
had_prior_review_context: None,
reviewed_action_truncated: false,
token_usage: None,
time_to_first_token_ms: None,
}
}
pub fn from_session(
guardian_thread_id: String,
guardian_session_kind: GuardianReviewSessionKind,
guardian_model: String,
guardian_reasoning_effort: Option<String>,
had_prior_review_context: bool,
) -> Self {
Self {
guardian_thread_id: Some(guardian_thread_id),
guardian_session_kind: Some(guardian_session_kind),
guardian_model: Some(guardian_model),
guardian_reasoning_effort,
had_prior_review_context: Some(had_prior_review_context),
..Self::without_session()
}
}
}
#[derive(Serialize)]
pub(crate) struct GuardianReviewEventPayload {
pub(crate) app_server_client: CodexAppServerClientMetadata,
@@ -446,11 +587,16 @@ pub(crate) fn codex_app_metadata(
}
pub(crate) fn codex_plugin_metadata(plugin: PluginTelemetryMetadata) -> CodexPluginMetadata {
let capability_summary = plugin.capability_summary;
let PluginTelemetryMetadata {
plugin_id,
remote_plugin_id,
capability_summary,
} = plugin;
let event_plugin_id = remote_plugin_id.unwrap_or_else(|| plugin_id.as_key());
CodexPluginMetadata {
plugin_id: Some(plugin.plugin_id.as_key()),
plugin_name: Some(plugin.plugin_id.plugin_name),
marketplace_name: Some(plugin.plugin_id.marketplace_name),
plugin_id: Some(event_plugin_id),
plugin_name: Some(plugin_id.plugin_name),
marketplace_name: Some(plugin_id.marketplace_name),
has_skills: capability_summary
.as_ref()
.map(|summary| summary.has_skills),
@@ -543,6 +689,8 @@ fn analytics_hook_source(source: HookSource) -> &'static str {
HookSource::Project => "project",
HookSource::Mdm => "mdm",
HookSource::SessionFlags => "session_flags",
HookSource::Plugin => "plugin",
HookSource::CloudRequirements => "cloud_requirements",
HookSource::LegacyManagedConfigFile => "legacy_managed_config_file",
HookSource::LegacyManagedConfigMdm => "legacy_managed_config_mdm",
HookSource::Unknown => "unknown",

View File

@@ -2,23 +2,25 @@ use crate::events::AppServerRpcTransport;
use crate::events::CodexRuntimeMetadata;
use crate::events::GuardianReviewEventParams;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponse;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::InitializeParams;
use codex_app_server_protocol::JSONRPCErrorError;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ServerRequest;
use codex_app_server_protocol::ServerResponse;
use codex_plugin::PluginTelemetryMetadata;
use codex_protocol::config_types::ApprovalsReviewer;
use codex_protocol::config_types::ModeKind;
use codex_protocol::config_types::Personality;
use codex_protocol::config_types::ReasoningSummary;
use codex_protocol::config_types::ServiceTier;
use codex_protocol::models::PermissionProfile;
use codex_protocol::openai_models::ReasoningEffort;
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::HookEventName;
use codex_protocol::protocol::HookRunStatus;
use codex_protocol::protocol::HookSource;
use codex_protocol::protocol::SandboxPolicy;
use codex_protocol::protocol::SessionSource;
use codex_protocol::protocol::SkillScope;
use codex_protocol::protocol::SubAgentSource;
@@ -62,7 +64,8 @@ pub struct TurnResolvedConfigFact {
pub session_source: SessionSource,
pub model: String,
pub model_provider: String,
pub sandbox_policy: SandboxPolicy,
pub permission_profile: PermissionProfile,
pub permission_profile_cwd: PathBuf,
pub reasoning_effort: Option<ReasoningEffort>,
pub reasoning_summary: Option<ReasoningSummary>,
pub service_tier: Option<ServiceTier>,
@@ -271,14 +274,15 @@ pub(crate) enum AnalyticsFact {
runtime: CodexRuntimeMetadata,
rpc_transport: AppServerRpcTransport,
},
Request {
ClientRequest {
connection_id: u64,
request_id: RequestId,
request: Box<ClientRequest>,
},
Response {
ClientResponse {
connection_id: u64,
response: Box<ClientResponse>,
request_id: RequestId,
response: Box<ClientResponsePayload>,
},
ErrorResponse {
connection_id: u64,
@@ -286,6 +290,13 @@ pub(crate) enum AnalyticsFact {
error: JSONRPCErrorError,
error_type: Option<AnalyticsJsonRpcError>,
},
ServerRequest {
connection_id: u64,
request: Box<ServerRequest>,
},
ServerResponse {
response: Box<ServerResponse>,
},
Notification(Box<ServerNotification>),
// Facts that do not naturally exist on the app-server protocol surface, or
// would require non-trivial protocol reshaping on this branch.

View File

@@ -9,11 +9,13 @@ use std::time::UNIX_EPOCH;
pub use client::AnalyticsEventsClient;
pub use events::AppServerRpcTransport;
pub use events::GuardianApprovalRequestSource;
pub use events::GuardianReviewAnalyticsResult;
pub use events::GuardianReviewDecision;
pub use events::GuardianReviewEventParams;
pub use events::GuardianReviewFailureReason;
pub use events::GuardianReviewSessionKind;
pub use events::GuardianReviewTerminalStatus;
pub use events::GuardianReviewTrackContext;
pub use events::GuardianReviewedAction;
pub use facts::AnalyticsJsonRpcError;
pub use facts::AppInvocation;

View File

@@ -61,7 +61,7 @@ use codex_login::default_client::originator;
use codex_protocol::config_types::ModeKind;
use codex_protocol::config_types::Personality;
use codex_protocol::config_types::ReasoningSummary;
use codex_protocol::protocol::SandboxPolicy;
use codex_protocol::models::PermissionProfile;
use codex_protocol::protocol::SessionSource;
use codex_protocol::protocol::SkillScope;
use codex_protocol::protocol::TokenUsage;
@@ -74,8 +74,7 @@ pub(crate) struct AnalyticsReducer {
requests: HashMap<(u64, RequestId), RequestState>,
turns: HashMap<String, TurnState>,
connections: HashMap<u64, ConnectionState>,
thread_connections: HashMap<String, u64>,
thread_metadata: HashMap<String, ThreadMetadataState>,
threads: HashMap<String, ThreadAnalyticsState>,
}
struct ConnectionState {
@@ -83,6 +82,69 @@ struct ConnectionState {
runtime: CodexRuntimeMetadata,
}
#[derive(Default)]
struct ThreadAnalyticsState {
connection_id: Option<u64>,
metadata: Option<ThreadMetadataState>,
}
#[derive(Clone, Copy)]
struct AnalyticsDropSite<'a> {
event_name: &'static str,
thread_id: &'a str,
turn_id: Option<&'a str>,
review_id: Option<&'a str>,
item_id: Option<&'a str>,
}
impl<'a> AnalyticsDropSite<'a> {
fn guardian(input: &'a GuardianReviewEventParams) -> Self {
Self {
event_name: "guardian",
thread_id: &input.thread_id,
turn_id: Some(&input.turn_id),
review_id: Some(&input.review_id),
item_id: None,
}
}
fn compaction(input: &'a CodexCompactionEvent) -> Self {
Self {
event_name: "compaction",
thread_id: &input.thread_id,
turn_id: Some(&input.turn_id),
review_id: None,
item_id: None,
}
}
fn turn_steer(thread_id: &'a str) -> Self {
Self {
event_name: "turn steer",
thread_id,
turn_id: None,
review_id: None,
item_id: None,
}
}
fn turn(thread_id: &'a str, turn_id: &'a str) -> Self {
Self {
event_name: "turn",
thread_id,
turn_id: Some(turn_id),
review_id: None,
item_id: None,
}
}
}
enum MissingAnalyticsContext {
ThreadConnection,
Connection { connection_id: u64 },
ThreadMetadata,
}
#[derive(Clone)]
struct ThreadMetadataState {
thread_source: Option<&'static str>,
@@ -106,6 +168,7 @@ impl ThreadMetadataState {
| SessionSource::Exec
| SessionSource::Mcp
| SessionSource::Custom(_)
| SessionSource::Internal(_)
| SessionSource::Unknown => (None, None),
};
Self {
@@ -171,18 +234,21 @@ impl AnalyticsReducer {
rpc_transport,
);
}
AnalyticsFact::Request {
AnalyticsFact::ClientRequest {
connection_id,
request_id,
request,
} => {
self.ingest_request(connection_id, request_id, *request);
}
AnalyticsFact::Response {
AnalyticsFact::ClientResponse {
connection_id,
request_id,
response,
} => {
self.ingest_response(connection_id, *response, out);
if let Some(response) = response.into_client_response(request_id) {
self.ingest_response(connection_id, response, out);
}
}
AnalyticsFact::ErrorResponse {
connection_id,
@@ -195,6 +261,13 @@ impl AnalyticsReducer {
AnalyticsFact::Notification(notification) => {
self.ingest_notification(*notification, out);
}
AnalyticsFact::ServerRequest {
connection_id: _connection_id,
request: _request,
} => {}
AnalyticsFact::ServerResponse {
response: _response,
} => {}
AnalyticsFact::Custom(input) => match input {
CustomAnalyticsFact::SubAgentThreadStarted(input) => {
self.ingest_subagent_thread_started(input, out);
@@ -263,6 +336,26 @@ impl AnalyticsReducer {
input: SubAgentThreadStartedInput,
out: &mut Vec<TrackEventRequest>,
) {
let parent_thread_id = input
.parent_thread_id
.clone()
.or_else(|| subagent_parent_thread_id(&input.subagent_source));
let parent_connection_id = parent_thread_id
.as_ref()
.and_then(|parent_thread_id| self.threads.get(parent_thread_id))
.and_then(|thread| thread.connection_id);
let thread_state = self.threads.entry(input.thread_id.clone()).or_default();
thread_state
.metadata
.get_or_insert_with(|| ThreadMetadataState {
thread_source: Some("subagent"),
initialization_mode: ThreadInitializationMode::New,
subagent_source: Some(subagent_source_name(&input.subagent_source)),
parent_thread_id,
});
if thread_state.connection_id.is_none() {
thread_state.connection_id = parent_connection_id;
}
out.push(TrackEventRequest::ThreadInitialized(
subagent_thread_started_event_request(input),
));
@@ -273,23 +366,9 @@ impl AnalyticsReducer {
input: GuardianReviewEventParams,
out: &mut Vec<TrackEventRequest>,
) {
let Some(connection_id) = self.thread_connections.get(&input.thread_id) else {
tracing::warn!(
thread_id = %input.thread_id,
turn_id = %input.turn_id,
review_id = %input.review_id,
"dropping guardian analytics event: missing thread connection metadata"
);
return;
};
let Some(connection_state) = self.connections.get(connection_id) else {
tracing::warn!(
thread_id = %input.thread_id,
turn_id = %input.turn_id,
review_id = %input.review_id,
connection_id,
"dropping guardian analytics event: missing connection metadata"
);
let Some(connection_state) =
self.thread_connection_or_warn(AnalyticsDropSite::guardian(&input))
else {
return;
};
out.push(TrackEventRequest::GuardianReview(Box::new(
@@ -675,10 +754,13 @@ impl AnalyticsReducer {
};
let thread_metadata =
ThreadMetadataState::from_thread_metadata(&thread_source, initialization_mode);
self.thread_connections
.insert(thread_id.clone(), connection_id);
self.thread_metadata
.insert(thread_id.clone(), thread_metadata.clone());
self.threads.insert(
thread_id.clone(),
ThreadAnalyticsState {
connection_id: Some(connection_id),
metadata: Some(thread_metadata.clone()),
},
);
out.push(TrackEventRequest::ThreadInitialized(
ThreadInitializedEvent {
event_type: "codex_thread_initialized",
@@ -699,29 +781,9 @@ impl AnalyticsReducer {
}
fn ingest_compaction(&mut self, input: CodexCompactionEvent, out: &mut Vec<TrackEventRequest>) {
let Some(connection_id) = self.thread_connections.get(&input.thread_id) else {
tracing::warn!(
thread_id = %input.thread_id,
turn_id = %input.turn_id,
"dropping compaction analytics event: missing thread connection metadata"
);
return;
};
let Some(connection_state) = self.connections.get(connection_id) else {
tracing::warn!(
thread_id = %input.thread_id,
turn_id = %input.turn_id,
connection_id,
"dropping compaction analytics event: missing connection metadata"
);
return;
};
let Some(thread_metadata) = self.thread_metadata.get(&input.thread_id) else {
tracing::warn!(
thread_id = %input.thread_id,
turn_id = %input.turn_id,
"dropping compaction analytics event: missing thread lifecycle metadata"
);
let Some((connection_state, thread_metadata)) =
self.thread_context_or_warn(AnalyticsDropSite::compaction(&input))
else {
return;
};
out.push(TrackEventRequest::Compaction(Box::new(
@@ -776,11 +838,13 @@ impl AnalyticsReducer {
let Some(connection_state) = self.connections.get(&connection_id) else {
return;
};
let Some(thread_metadata) = self.thread_metadata.get(&pending_request.thread_id) else {
tracing::warn!(
thread_id = %pending_request.thread_id,
"dropping turn steer analytics event: missing thread lifecycle metadata"
);
let drop_site = AnalyticsDropSite::turn_steer(&pending_request.thread_id);
let Some(thread_metadata) = self
.threads
.get(drop_site.thread_id)
.and_then(|thread| thread.metadata.as_ref())
else {
warn_missing_analytics_context(&drop_site, MissingAnalyticsContext::ThreadMetadata);
return;
};
out.push(TrackEventRequest::TurnSteer(CodexTurnSteerEventRequest {
@@ -813,42 +877,34 @@ impl AnalyticsReducer {
{
return;
}
let connection_metadata = turn_state
.connection_id
.and_then(|connection_id| self.connections.get(&connection_id))
.map(|connection_state| {
(
connection_state.app_server_client.clone(),
connection_state.runtime.clone(),
)
});
let Some((app_server_client, runtime)) = connection_metadata else {
if let Some(connection_id) = turn_state.connection_id {
tracing::warn!(
turn_id,
connection_id,
"dropping turn analytics event: missing connection metadata"
);
}
return;
};
let Some(thread_id) = turn_state.thread_id.as_ref() else {
return;
};
let Some(thread_metadata) = self.thread_metadata.get(thread_id) else {
tracing::warn!(
thread_id,
turn_id,
"dropping turn analytics event: missing thread lifecycle metadata"
let Some(connection_id) = turn_state.connection_id else {
return;
};
let Some(connection_state) = self.connections.get(&connection_id) else {
warn_missing_analytics_context(
&AnalyticsDropSite::turn(thread_id, turn_id),
MissingAnalyticsContext::Connection { connection_id },
);
return;
};
let drop_site = AnalyticsDropSite::turn(thread_id, turn_id);
let Some(thread_metadata) = self
.threads
.get(drop_site.thread_id)
.and_then(|thread| thread.metadata.as_ref())
else {
warn_missing_analytics_context(&drop_site, MissingAnalyticsContext::ThreadMetadata);
return;
};
out.push(TrackEventRequest::TurnEvent(Box::new(
CodexTurnEventRequest {
event_type: "codex_turn_event",
event_params: codex_turn_event_params(
app_server_client,
runtime,
connection_state.app_server_client.clone(),
connection_state.runtime.clone(),
turn_id.to_string(),
turn_state,
thread_metadata,
@@ -857,6 +913,67 @@ impl AnalyticsReducer {
)));
self.turns.remove(turn_id);
}
fn thread_connection_or_warn(
&self,
drop_site: AnalyticsDropSite<'_>,
) -> Option<&ConnectionState> {
let Some(thread_state) = self.threads.get(drop_site.thread_id) else {
warn_missing_analytics_context(&drop_site, MissingAnalyticsContext::ThreadConnection);
return None;
};
let Some(connection_id) = thread_state.connection_id else {
warn_missing_analytics_context(&drop_site, MissingAnalyticsContext::ThreadConnection);
return None;
};
let Some(connection_state) = self.connections.get(&connection_id) else {
warn_missing_analytics_context(
&drop_site,
MissingAnalyticsContext::Connection { connection_id },
);
return None;
};
Some(connection_state)
}
fn thread_context_or_warn(
&self,
drop_site: AnalyticsDropSite<'_>,
) -> Option<(&ConnectionState, &ThreadMetadataState)> {
let connection_state = self.thread_connection_or_warn(drop_site)?;
let Some(thread_metadata) = self
.threads
.get(drop_site.thread_id)
.and_then(|thread| thread.metadata.as_ref())
else {
warn_missing_analytics_context(&drop_site, MissingAnalyticsContext::ThreadMetadata);
return None;
};
Some((connection_state, thread_metadata))
}
}
fn warn_missing_analytics_context(
drop_site: &AnalyticsDropSite<'_>,
missing: MissingAnalyticsContext,
) {
let (missing_context, connection_id) = match missing {
MissingAnalyticsContext::ThreadConnection => ("thread_connection", None),
MissingAnalyticsContext::Connection { connection_id } => {
("connection", Some(connection_id))
}
MissingAnalyticsContext::ThreadMetadata => ("thread_metadata", None),
};
tracing::warn!(
thread_id = %drop_site.thread_id,
turn_id = ?drop_site.turn_id,
review_id = ?drop_site.review_id,
item_id = ?drop_site.item_id,
missing_context,
connection_id,
"dropping {} analytics event: missing analytics context",
drop_site.event_name
);
}
fn codex_turn_event_params(
@@ -884,7 +1001,8 @@ fn codex_turn_event_params(
session_source: _session_source,
model,
model_provider,
sandbox_policy,
permission_profile,
permission_profile_cwd,
reasoning_effort,
reasoning_summary,
service_tier,
@@ -909,7 +1027,10 @@ fn codex_turn_event_params(
parent_thread_id: thread_metadata.parent_thread_id.clone(),
model: Some(model),
model_provider,
sandbox_policy: Some(sandbox_policy_mode(&sandbox_policy)),
sandbox_policy: Some(sandbox_policy_mode(
&permission_profile,
permission_profile_cwd.as_path(),
)),
reasoning_effort: reasoning_effort.map(|value| value.to_string()),
reasoning_summary: reasoning_summary_mode(reasoning_summary),
service_tier: service_tier
@@ -954,12 +1075,27 @@ fn codex_turn_event_params(
}
}
fn sandbox_policy_mode(sandbox_policy: &SandboxPolicy) -> &'static str {
match sandbox_policy {
SandboxPolicy::DangerFullAccess => "full_access",
SandboxPolicy::ReadOnly { .. } => "read_only",
SandboxPolicy::WorkspaceWrite { .. } => "workspace_write",
SandboxPolicy::ExternalSandbox { .. } => "external_sandbox",
fn sandbox_policy_mode(permission_profile: &PermissionProfile, cwd: &Path) -> &'static str {
match permission_profile {
PermissionProfile::Disabled => "full_access",
PermissionProfile::External { .. } => "external_sandbox",
PermissionProfile::Managed { .. } => {
let file_system_policy = permission_profile.file_system_sandbox_policy();
if file_system_policy.has_full_disk_write_access() {
if permission_profile.network_sandbox_policy().is_enabled() {
"full_access"
} else {
"external_sandbox"
}
} else if file_system_policy
.get_writable_roots_with_cwd(cwd)
.is_empty()
{
"read_only"
} else {
"workspace_write"
}
}
}
}
@@ -1050,3 +1186,25 @@ pub(crate) fn normalize_path_for_skill_id(
_ => resolved_path.to_string_lossy().replace('\\', "/"),
}
}
#[cfg(test)]
mod tests {
use super::*;
use codex_protocol::models::SandboxEnforcement;
use codex_protocol::permissions::FileSystemSandboxPolicy;
use codex_protocol::permissions::NetworkSandboxPolicy;
#[test]
fn managed_full_disk_with_restricted_network_reports_external_sandbox() {
let permission_profile = PermissionProfile::from_runtime_permissions_with_enforcement(
SandboxEnforcement::Managed,
&FileSystemSandboxPolicy::unrestricted(),
NetworkSandboxPolicy::Restricted,
);
assert_eq!(
sandbox_policy_mode(&permission_profile, Path::new("/")),
"external_sandbox"
);
}
}

View File

@@ -41,11 +41,14 @@ use codex_app_server_protocol::Result as JsonRpcResult;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ServerRequest;
use codex_arg0::Arg0DispatchPaths;
use codex_config::CloudRequirementsLoader;
use codex_config::LoaderOverrides;
use codex_config::NoopThreadConfigLoader;
use codex_config::RemoteThreadConfigLoader;
use codex_config::ThreadConfigLoader;
use codex_core::config::Config;
use codex_core::config_loader::CloudRequirementsLoader;
use codex_core::config_loader::LoaderOverrides;
pub use codex_exec_server::EnvironmentManager;
pub use codex_exec_server::EnvironmentManagerArgs;
pub use codex_exec_server::ExecServerRuntimePaths;
use codex_feedback::CodexFeedback;
use codex_protocol::protocol::SessionSource;
@@ -96,10 +99,6 @@ pub mod legacy_core {
pub use codex_core::personality_migration::*;
}
pub mod plugins {
pub use codex_core::plugins::*;
}
pub mod review_format {
pub use codex_core::review_format::*;
}
@@ -301,7 +300,15 @@ impl fmt::Display for TypedRequestError {
write!(f, "{method} transport error: {source}")
}
Self::Server { method, source } => {
write!(f, "{method} failed: {}", source.message)
write!(
f,
"{method} failed: {} (code {})",
source.message, source.code
)?;
if let Some(data) = source.data.as_ref() {
write!(f, ", data: {data}")?;
}
Ok(())
}
Self::Deserialize { method, source } => {
write!(f, "{method} response decode error: {source}")
@@ -356,6 +363,13 @@ pub struct InProcessClientStartArgs {
pub channel_capacity: usize,
}
fn configured_thread_config_loader(config: &Config) -> Arc<dyn ThreadConfigLoader> {
match config.experimental_thread_config_endpoint.as_deref() {
Some(endpoint) => Arc::new(RemoteThreadConfigLoader::new(endpoint)),
None => Arc::new(NoopThreadConfigLoader),
}
}
impl InProcessClientStartArgs {
/// Builds initialize params from caller-provided metadata.
pub fn initialize_params(&self) -> InitializeParams {
@@ -380,13 +394,14 @@ impl InProcessClientStartArgs {
fn into_runtime_start_args(self) -> InProcessStartArgs {
let initialize = self.initialize_params();
let thread_config_loader = configured_thread_config_loader(&self.config);
InProcessStartArgs {
arg0_paths: self.arg0_paths,
config: self.config,
cli_overrides: self.cli_overrides,
loader_overrides: self.loader_overrides,
cloud_requirements: self.cloud_requirements,
thread_config_loader: Arc::new(NoopThreadConfigLoader),
thread_config_loader,
feedback: self.feedback,
log_db: self.log_db,
environment_manager: self.environment_manager,
@@ -968,7 +983,7 @@ mod tests {
cloud_requirements: CloudRequirementsLoader::default(),
feedback: CodexFeedback::new(),
log_db: None,
environment_manager: Arc::new(EnvironmentManager::new(/*exec_server_url*/ None)),
environment_manager: Arc::new(EnvironmentManager::default_for_tests()),
config_warnings: Vec::new(),
session_source,
enable_codex_api_key_env: false,
@@ -1385,6 +1400,55 @@ mod tests {
client.shutdown().await.expect("shutdown should complete");
}
#[tokio::test]
async fn remote_typed_request_accepts_large_single_frame_response() {
let padding = "x".repeat((17 << 20) + 1024);
let websocket_url = start_test_remote_server(move |mut websocket| async move {
expect_remote_initialize(&mut websocket).await;
let JSONRPCMessage::Request(request) = read_websocket_message(&mut websocket).await
else {
panic!("expected account/read request");
};
assert_eq!(request.method, "account/read");
write_websocket_message(
&mut websocket,
JSONRPCMessage::Response(JSONRPCResponse {
id: request.id,
result: serde_json::json!({
"account": null,
"requiresOpenaiAuth": false,
"padding": padding,
}),
}),
)
.await;
websocket.close(None).await.expect("close should succeed");
})
.await;
let client = RemoteAppServerClient::connect(test_remote_connect_args(websocket_url))
.await
.expect("remote client should connect");
let response: GetAccountResponse = client
.request_typed(ClientRequest::GetAccount {
request_id: RequestId::Integer(1),
params: codex_app_server_protocol::GetAccountParams {
refresh_token: false,
},
})
.await
.expect("large typed request should succeed");
assert_eq!(
response,
GetAccountResponse {
account: None,
requires_openai_auth: false,
}
);
client.shutdown().await.expect("shutdown should complete");
}
#[tokio::test]
async fn remote_connect_includes_auth_header_when_configured() {
let auth_token = "remote-bearer-token".to_string();
@@ -1859,11 +1923,15 @@ mod tests {
method: "thread/read".to_string(),
source: JSONRPCErrorError {
code: -32603,
data: None,
data: Some(serde_json::json!({"detail": "config lock mismatch"})),
message: "internal".to_string(),
},
};
assert_eq!(std::error::Error::source(&server).is_some(), false);
assert_eq!(
server.to_string(),
"thread/read failed: internal (code -32603), data: {\"detail\":\"config lock mismatch\"}"
);
let deserialize = TypedRequestError::Deserialize {
method: "thread/start".to_string(),
@@ -1969,9 +2037,17 @@ mod tests {
#[tokio::test]
async fn runtime_start_args_forward_environment_manager() {
let config = Arc::new(build_test_config().await);
let environment_manager = Arc::new(EnvironmentManager::new(Some(
"ws://127.0.0.1:8765".to_string(),
)));
let environment_manager = Arc::new(
EnvironmentManager::create_for_tests(
Some("ws://127.0.0.1:8765".to_string()),
ExecServerRuntimePaths::new(
std::env::current_exe().expect("current exe"),
/*codex_linux_sandbox_exe*/ None,
)
.expect("runtime paths"),
)
.await,
);
let runtime_args = InProcessClientStartArgs {
arg0_paths: Arg0DispatchPaths::default(),
@@ -1998,7 +2074,49 @@ mod tests {
&runtime_args.environment_manager,
&environment_manager
));
assert!(runtime_args.environment_manager.is_remote());
assert!(
runtime_args
.environment_manager
.default_environment()
.expect("default environment")
.is_remote()
);
}
#[tokio::test]
async fn runtime_start_args_use_remote_thread_config_loader_when_configured() {
let mut config = build_test_config().await;
config.experimental_thread_config_endpoint = Some("not-a-valid-endpoint".to_string());
let runtime_args = InProcessClientStartArgs {
arg0_paths: Arg0DispatchPaths::default(),
config: Arc::new(config),
cli_overrides: Vec::new(),
loader_overrides: LoaderOverrides::default(),
cloud_requirements: CloudRequirementsLoader::default(),
feedback: CodexFeedback::new(),
log_db: None,
environment_manager: Arc::new(EnvironmentManager::default_for_tests()),
config_warnings: Vec::new(),
session_source: SessionSource::Exec,
enable_codex_api_key_env: false,
client_name: "codex-app-server-client-test".to_string(),
client_version: "0.0.0-test".to_string(),
experimental_api: true,
opt_out_notification_methods: Vec::new(),
channel_capacity: DEFAULT_IN_PROCESS_CHANNEL_CAPACITY,
}
.into_runtime_start_args();
let err = runtime_args
.thread_config_loader
.load(Default::default())
.await
.expect_err("configured remote loader should try to connect");
assert_eq!(
err.code(),
codex_config::ThreadConfigLoadErrorCode::RequestFailed
);
}
#[tokio::test]

View File

@@ -20,7 +20,6 @@ use crate::RequestResult;
use crate::SHUTDOWN_TIMEOUT;
use crate::TypedRequestError;
use crate::request_method_name;
use crate::server_notification_requires_delivery;
use codex_app_server_protocol::ClientInfo;
use codex_app_server_protocol::ClientNotification;
use codex_app_server_protocol::ClientRequest;
@@ -46,16 +45,18 @@ use tokio::sync::oneshot;
use tokio::time::timeout;
use tokio_tungstenite::MaybeTlsStream;
use tokio_tungstenite::WebSocketStream;
use tokio_tungstenite::connect_async;
use tokio_tungstenite::connect_async_with_config;
use tokio_tungstenite::tungstenite::Message;
use tokio_tungstenite::tungstenite::client::IntoClientRequest;
use tokio_tungstenite::tungstenite::http::HeaderValue;
use tokio_tungstenite::tungstenite::http::header::AUTHORIZATION;
use tokio_tungstenite::tungstenite::protocol::WebSocketConfig;
use tracing::warn;
use url::Url;
const CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
const INITIALIZE_TIMEOUT: Duration = Duration::from_secs(10);
const REMOTE_APP_SERVER_MAX_WEBSOCKET_MESSAGE_SIZE: usize = 128 << 20;
#[derive(Debug, Clone)]
pub struct RemoteAppServerConnectArgs {
@@ -126,7 +127,7 @@ enum RemoteClientCommand {
pub struct RemoteAppServerClient {
command_tx: mpsc::Sender<RemoteClientCommand>,
event_rx: mpsc::Receiver<AppServerEvent>,
event_rx: mpsc::UnboundedReceiver<AppServerEvent>,
pending_events: VecDeque<AppServerEvent>,
worker_handle: tokio::task::JoinHandle<()>,
}
@@ -171,20 +172,32 @@ impl RemoteAppServerClient {
request.headers_mut().insert(AUTHORIZATION, header_value);
}
ensure_rustls_crypto_provider();
let stream = timeout(CONNECT_TIMEOUT, connect_async(request))
.await
.map_err(|_| {
IoError::new(
ErrorKind::TimedOut,
format!("timed out connecting to remote app server at `{websocket_url}`"),
)
})?
.map(|(stream, _response)| stream)
.map_err(|err| {
IoError::other(format!(
"failed to connect to remote app server at `{websocket_url}`: {err}"
))
})?;
// Remote resume responses can legitimately carry large thread histories.
// Keep a bounded cap, but raise it above tungstenite's 16 MiB frame default.
let websocket_config = WebSocketConfig::default()
.max_frame_size(Some(REMOTE_APP_SERVER_MAX_WEBSOCKET_MESSAGE_SIZE))
.max_message_size(Some(REMOTE_APP_SERVER_MAX_WEBSOCKET_MESSAGE_SIZE));
let stream = timeout(
CONNECT_TIMEOUT,
connect_async_with_config(
request,
Some(websocket_config),
/*disable_nagle*/ false,
),
)
.await
.map_err(|_| {
IoError::new(
ErrorKind::TimedOut,
format!("timed out connecting to remote app server at `{websocket_url}`"),
)
})?
.map(|(stream, _response)| stream)
.map_err(|err| {
IoError::other(format!(
"failed to connect to remote app server at `{websocket_url}`: {err}"
))
})?;
let mut stream = stream;
let pending_events = initialize_remote_connection(
&mut stream,
@@ -195,11 +208,11 @@ impl RemoteAppServerClient {
.await?;
let (command_tx, mut command_rx) = mpsc::channel::<RemoteClientCommand>(channel_capacity);
let (event_tx, event_rx) = mpsc::channel::<AppServerEvent>(channel_capacity);
let (event_tx, event_rx) = mpsc::unbounded_channel::<AppServerEvent>();
let worker_handle = tokio::spawn(async move {
let mut pending_requests =
HashMap::<RequestId, oneshot::Sender<IoResult<RequestResult>>>::new();
let mut skipped_events = 0usize;
let mut worker_exit_error: Option<(ErrorKind, String)> = None;
loop {
tokio::select! {
command = command_rx.recv() => {
@@ -226,20 +239,19 @@ impl RemoteAppServerClient {
.await
{
let err_message = err.to_string();
let message = format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
);
if let Some(response_tx) = pending_requests.remove(&request_id) {
let _ = response_tx.send(Err(err));
}
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((ErrorKind::BrokenPipe, message));
break;
}
}
@@ -316,11 +328,8 @@ impl RemoteAppServerClient {
app_server_event_from_notification(notification)
&& let Err(err) = deliver_event(
&event_tx,
&mut skipped_events,
event,
&mut stream,
)
.await
{
warn!(%err, "failed to deliver remote app-server event");
break;
@@ -333,11 +342,8 @@ impl RemoteAppServerClient {
Ok(request) => {
if let Err(err) = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::ServerRequest(request),
&mut stream,
)
.await
{
warn!(%err, "failed to deliver remote app-server server request");
break;
@@ -362,34 +368,34 @@ impl RemoteAppServerClient {
.await
{
let err_message = reject_err.to_string();
let message = format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error =
Some((ErrorKind::BrokenPipe, message));
break;
}
}
}
}
Err(err) => {
let message = format!(
"remote app server at `{websocket_url}` sent invalid JSON-RPC: {err}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` sent invalid JSON-RPC: {err}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error =
Some((ErrorKind::InvalidData, message));
break;
}
}
@@ -400,17 +406,19 @@ impl RemoteAppServerClient {
.map(|frame| frame.reason.to_string())
.filter(|reason| !reason.is_empty())
.unwrap_or_else(|| "connection closed".to_string());
let message = format!(
"remote app server at `{websocket_url}` disconnected: {reason}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` disconnected: {reason}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((
ErrorKind::ConnectionAborted,
message,
));
break;
}
Some(Ok(Message::Binary(_)))
@@ -418,31 +426,29 @@ impl RemoteAppServerClient {
| Some(Ok(Message::Pong(_)))
| Some(Ok(Message::Frame(_))) => {}
Some(Err(err)) => {
let message = format!(
"remote app server at `{websocket_url}` transport failed: {err}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` transport failed: {err}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((ErrorKind::InvalidData, message));
break;
}
None => {
let message = format!(
"remote app server at `{websocket_url}` closed the connection"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` closed the connection"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((ErrorKind::UnexpectedEof, message));
break;
}
}
@@ -450,12 +456,14 @@ impl RemoteAppServerClient {
}
}
let err = IoError::new(
ErrorKind::BrokenPipe,
"remote app-server worker channel is closed",
);
let (err_kind, err_message) = worker_exit_error.unwrap_or_else(|| {
(
ErrorKind::BrokenPipe,
"remote app-server worker channel is closed".to_string(),
)
});
for (_, response_tx) in pending_requests {
let _ = response_tx.send(Err(IoError::new(err.kind(), err.to_string())));
let _ = response_tx.send(Err(IoError::new(err_kind, err_message.clone())));
}
});
@@ -612,14 +620,9 @@ impl RemoteAppServerClient {
.send(RemoteClientCommand::Shutdown { response_tx })
.await
.is_ok()
&& let Ok(command_result) = timeout(SHUTDOWN_TIMEOUT, response_rx).await
&& let Ok(Ok(close_result)) = timeout(SHUTDOWN_TIMEOUT, response_rx).await
{
command_result.map_err(|_| {
IoError::new(
ErrorKind::BrokenPipe,
"remote app-server shutdown channel is closed",
)
})??;
close_result?;
}
if let Err(_elapsed) = timeout(SHUTDOWN_TIMEOUT, &mut worker_handle).await {
@@ -806,100 +809,16 @@ fn app_server_event_from_notification(notification: JSONRPCNotification) -> Opti
}
}
async fn deliver_event(
event_tx: &mpsc::Sender<AppServerEvent>,
skipped_events: &mut usize,
fn deliver_event(
event_tx: &mpsc::UnboundedSender<AppServerEvent>,
event: AppServerEvent,
stream: &mut WebSocketStream<MaybeTlsStream<TcpStream>>,
) -> IoResult<()> {
if *skipped_events > 0 {
if event_requires_delivery(&event) {
if event_tx
.send(AppServerEvent::Lagged {
skipped: *skipped_events,
})
.await
.is_err()
{
return Err(IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
));
}
*skipped_events = 0;
} else {
match event_tx.try_send(AppServerEvent::Lagged {
skipped: *skipped_events,
}) {
Ok(()) => *skipped_events = 0,
Err(mpsc::error::TrySendError::Full(_)) => {
*skipped_events = (*skipped_events).saturating_add(1);
reject_if_server_request_dropped(stream, &event).await?;
return Ok(());
}
Err(mpsc::error::TrySendError::Closed(_)) => {
return Err(IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
));
}
}
}
}
if event_requires_delivery(&event) {
event_tx.send(event).await.map_err(|_| {
IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
)
})?;
return Ok(());
}
match event_tx.try_send(event) {
Ok(()) => Ok(()),
Err(mpsc::error::TrySendError::Full(event)) => {
*skipped_events = (*skipped_events).saturating_add(1);
reject_if_server_request_dropped(stream, &event).await
}
Err(mpsc::error::TrySendError::Closed(_)) => Err(IoError::new(
event_tx.send(event).map_err(|_| {
IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
)),
}
}
async fn reject_if_server_request_dropped(
stream: &mut WebSocketStream<MaybeTlsStream<TcpStream>>,
event: &AppServerEvent,
) -> IoResult<()> {
let AppServerEvent::ServerRequest(request) = event else {
return Ok(());
};
write_jsonrpc_message(
stream,
JSONRPCMessage::Error(JSONRPCError {
error: JSONRPCErrorError {
code: -32001,
message: "remote app-server event queue is full".to_string(),
data: None,
},
id: request.id().clone(),
}),
"<remote-app-server>",
)
.await
}
fn event_requires_delivery(event: &AppServerEvent) -> bool {
match event {
AppServerEvent::ServerNotification(notification) => {
server_notification_requires_delivery(notification)
}
AppServerEvent::Disconnected { .. } => true,
AppServerEvent::Lagged { .. } | AppServerEvent::ServerRequest(_) => false,
}
)
})
}
fn request_id_from_client_request(request: &ClientRequest) -> RequestId {
@@ -945,40 +864,27 @@ async fn write_jsonrpc_message(
))
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn event_requires_delivery_marks_transcript_and_disconnect_events() {
assert!(event_requires_delivery(
&AppServerEvent::ServerNotification(ServerNotification::AgentMessageDelta(
codex_app_server_protocol::AgentMessageDeltaNotification {
thread_id: "thread".to_string(),
turn_id: "turn".to_string(),
item_id: "item".to_string(),
delta: "hello".to_string(),
},
),)
));
assert!(event_requires_delivery(
&AppServerEvent::ServerNotification(ServerNotification::ItemCompleted(
codex_app_server_protocol::ItemCompletedNotification {
thread_id: "thread".to_string(),
turn_id: "turn".to_string(),
item: codex_app_server_protocol::ThreadItem::Plan {
id: "item".to_string(),
text: "step".to_string(),
},
}
),)
));
assert!(event_requires_delivery(&AppServerEvent::Disconnected {
message: "closed".to_string(),
}));
assert!(!event_requires_delivery(&AppServerEvent::Lagged {
skipped: 1
}));
#[tokio::test]
async fn shutdown_tolerates_worker_exit_after_command_is_queued() {
let (command_tx, mut command_rx) = mpsc::channel(1);
let (_event_tx, event_rx) = mpsc::unbounded_channel::<AppServerEvent>();
let worker_handle = tokio::spawn(async move {
let _ = command_rx.recv().await;
});
let client = RemoteAppServerClient {
command_tx,
event_rx,
pending_events: VecDeque::new(),
worker_handle,
};
client
.shutdown()
.await
.expect("shutdown should complete when worker exits first");
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -76,7 +78,8 @@
{
"type": "null"
}
]
],
"description": "Partial overlay used for per-command permission requests."
}
},
"type": "object"
@@ -389,21 +392,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -175,21 +177,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
@@ -295,6 +282,9 @@
}
},
"properties": {
"cwd": {
"$ref": "#/definitions/AbsolutePathBuf"
},
"itemId": {
"type": "string"
},
@@ -315,6 +305,7 @@
}
},
"required": [
"cwd",
"itemId",
"permissions",
"threadId",

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -175,21 +177,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
@@ -311,6 +298,13 @@
}
],
"default": "turn"
},
"strictAutoReview": {
"description": "Review every subsequent command in this turn before normal sandboxed execution.",
"type": [
"boolean",
"null"
]
}
},
"required": [

View File

@@ -84,6 +84,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -93,6 +94,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -436,6 +438,13 @@
"chatgptAuthTokens"
],
"type": "string"
},
{
"description": "Programmatic Codex auth backed by a registered Agent Identity.",
"enum": [
"agentIdentity"
],
"type": "string"
}
]
},
@@ -473,6 +482,7 @@
"contextWindowExceeded",
"usageLimitExceeded",
"serverOverloaded",
"cyberPolicy",
"internalServerError",
"unauthorized",
"badRequest",
@@ -1022,6 +1032,7 @@
"type": "object"
},
"FileChangeOutputDeltaNotification": {
"description": "Deprecated legacy notification for `apply_patch` textual output.\n\nThe server no longer emits this notification.",
"properties": {
"delta": {
"type": "string"
@@ -1189,21 +1200,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
@@ -1697,6 +1693,23 @@
],
"type": "string"
},
"GuardianWarningNotification": {
"properties": {
"message": {
"description": "Concise guardian warning message for the user.",
"type": "string"
},
"threadId": {
"description": "Thread target for the guardian warning.",
"type": "string"
}
},
"required": [
"message",
"threadId"
],
"type": "object"
},
"HookCompletedNotification": {
"properties": {
"run": {
@@ -1888,6 +1901,8 @@
"project",
"mdm",
"sessionFlags",
"plugin",
"cloudRequirements",
"legacyManagedConfigFile",
"legacyManagedConfigMdm",
"unknown"
@@ -2236,6 +2251,34 @@
],
"type": "object"
},
"ModelVerification": {
"enum": [
"trustedAccessForCyber"
],
"type": "string"
},
"ModelVerificationNotification": {
"properties": {
"threadId": {
"type": "string"
},
"turnId": {
"type": "string"
},
"verifications": {
"items": {
"$ref": "#/definitions/ModelVerification"
},
"type": "array"
}
},
"required": [
"threadId",
"turnId",
"verifications"
],
"type": "object"
},
"NetworkApprovalProtocol": {
"enum": [
"http",
@@ -2562,6 +2605,33 @@
],
"type": "object"
},
"RemoteControlConnectionStatus": {
"enum": [
"disabled",
"connecting",
"connected",
"errored"
],
"type": "string"
},
"RemoteControlStatusChangedNotification": {
"description": "Current remote-control connection status and environment id exposed to clients.",
"properties": {
"environmentId": {
"type": [
"string",
"null"
]
},
"status": {
"$ref": "#/definitions/RemoteControlConnectionStatus"
}
},
"required": [
"status"
],
"type": "object"
},
"RequestId": {
"anyOf": [
{
@@ -2973,6 +3043,93 @@
],
"type": "object"
},
"ThreadGoal": {
"properties": {
"createdAt": {
"format": "int64",
"type": "integer"
},
"objective": {
"type": "string"
},
"status": {
"$ref": "#/definitions/ThreadGoalStatus"
},
"threadId": {
"type": "string"
},
"timeUsedSeconds": {
"format": "int64",
"type": "integer"
},
"tokenBudget": {
"format": "int64",
"type": [
"integer",
"null"
]
},
"tokensUsed": {
"format": "int64",
"type": "integer"
},
"updatedAt": {
"format": "int64",
"type": "integer"
}
},
"required": [
"createdAt",
"objective",
"status",
"threadId",
"timeUsedSeconds",
"tokensUsed",
"updatedAt"
],
"type": "object"
},
"ThreadGoalClearedNotification": {
"properties": {
"threadId": {
"type": "string"
}
},
"required": [
"threadId"
],
"type": "object"
},
"ThreadGoalStatus": {
"enum": [
"active",
"paused",
"budgetLimited",
"complete"
],
"type": "string"
},
"ThreadGoalUpdatedNotification": {
"properties": {
"goal": {
"$ref": "#/definitions/ThreadGoal"
},
"threadId": {
"type": "string"
},
"turnId": {
"type": [
"string",
"null"
]
}
},
"required": [
"goal",
"threadId"
],
"type": "object"
},
"ThreadId": {
"type": "string"
},
@@ -3774,7 +3931,7 @@
"ThreadRealtimeStartedNotification": {
"description": "EXPERIMENTAL - emitted when thread realtime startup is accepted.",
"properties": {
"sessionId": {
"realtimeSessionId": {
"type": [
"string",
"null"
@@ -4672,6 +4829,46 @@
"title": "Thread/name/updatedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"thread/goal/updated"
],
"title": "Thread/goal/updatedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ThreadGoalUpdatedNotification"
}
},
"required": [
"method",
"params"
],
"title": "Thread/goal/updatedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"thread/goal/cleared"
],
"title": "Thread/goal/clearedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ThreadGoalClearedNotification"
}
},
"required": [
"method",
"params"
],
"title": "Thread/goal/clearedNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -4995,6 +5192,7 @@
"type": "object"
},
{
"description": "Deprecated legacy apply_patch output stream notification.",
"properties": {
"method": {
"enum": [
@@ -5174,6 +5372,26 @@
"title": "App/list/updatedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"remoteControl/status/changed"
],
"title": "RemoteControl/status/changedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/RemoteControlStatusChangedNotification"
}
},
"required": [
"method",
"params"
],
"title": "RemoteControl/status/changedNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -5315,6 +5533,26 @@
"title": "Model/reroutedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"model/verification"
],
"title": "Model/verificationNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ModelVerificationNotification"
}
},
"required": [
"method",
"params"
],
"title": "Model/verificationNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -5335,6 +5573,26 @@
"title": "WarningNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"guardianWarning"
],
"title": "GuardianWarningNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/GuardianWarningNotification"
}
},
"required": [
"method",
"params"
],
"title": "GuardianWarningNotification",
"type": "object"
},
{
"properties": {
"method": {

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -76,7 +78,8 @@
{
"type": "null"
}
]
],
"description": "Partial overlay used for per-command permission requests."
}
},
"type": "object"
@@ -728,21 +731,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
@@ -1584,6 +1572,9 @@
},
"PermissionsRequestApprovalParams": {
"properties": {
"cwd": {
"$ref": "#/definitions/AbsolutePathBuf"
},
"itemId": {
"type": "string"
},
@@ -1604,6 +1595,7 @@
}
},
"required": [
"cwd",
"itemId",
"permissions",
"threadId",

View File

@@ -24,6 +24,13 @@
"chatgptAuthTokens"
],
"type": "string"
},
{
"description": "Programmatic Codex auth backed by a registered Agent Identity.",
"enum": [
"agentIdentity"
],
"type": "string"
}
]
},

View File

@@ -27,6 +27,202 @@
],
"type": "object"
},
"FileSystemAccessMode": {
"enum": [
"read",
"write",
"none"
],
"type": "string"
},
"FileSystemPath": {
"oneOf": [
{
"properties": {
"path": {
"$ref": "#/definitions/AbsolutePathBuf"
},
"type": {
"enum": [
"path"
],
"title": "PathFileSystemPathType",
"type": "string"
}
},
"required": [
"path",
"type"
],
"title": "PathFileSystemPath",
"type": "object"
},
{
"properties": {
"pattern": {
"type": "string"
},
"type": {
"enum": [
"glob_pattern"
],
"title": "GlobPatternFileSystemPathType",
"type": "string"
}
},
"required": [
"pattern",
"type"
],
"title": "GlobPatternFileSystemPath",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"special"
],
"title": "SpecialFileSystemPathType",
"type": "string"
},
"value": {
"$ref": "#/definitions/FileSystemSpecialPath"
}
},
"required": [
"type",
"value"
],
"title": "SpecialFileSystemPath",
"type": "object"
}
]
},
"FileSystemSandboxEntry": {
"properties": {
"access": {
"$ref": "#/definitions/FileSystemAccessMode"
},
"path": {
"$ref": "#/definitions/FileSystemPath"
}
},
"required": [
"access",
"path"
],
"type": "object"
},
"FileSystemSpecialPath": {
"oneOf": [
{
"properties": {
"kind": {
"enum": [
"root"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "RootFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"minimal"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"project_roots"
],
"type": "string"
},
"subpath": {
"type": [
"string",
"null"
]
}
},
"required": [
"kind"
],
"title": "KindFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"tmpdir"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "TmpdirFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"slash_tmp"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "SlashTmpFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"unknown"
],
"type": "string"
},
"path": {
"type": "string"
},
"subpath": {
"type": [
"string",
"null"
]
}
},
"required": [
"kind",
"path"
],
"type": "object"
}
]
},
"NetworkAccess": {
"enum": [
"restricted",
@@ -34,53 +230,135 @@
],
"type": "string"
},
"ReadOnlyAccess": {
"PermissionProfile": {
"oneOf": [
{
"description": "Codex owns sandbox construction for this profile.",
"properties": {
"includePlatformDefaults": {
"default": true,
"type": "boolean"
"fileSystem": {
"$ref": "#/definitions/PermissionProfileFileSystemPermissions"
},
"readableRoots": {
"default": [],
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
"type": "array"
"network": {
"$ref": "#/definitions/PermissionProfileNetworkPermissions"
},
"type": {
"enum": [
"restricted"
"managed"
],
"title": "RestrictedReadOnlyAccessType",
"title": "ManagedPermissionProfileType",
"type": "string"
}
},
"required": [
"fileSystem",
"network",
"type"
],
"title": "ManagedPermissionProfile",
"type": "object"
},
{
"description": "Do not apply an outer sandbox.",
"properties": {
"type": {
"enum": [
"disabled"
],
"title": "DisabledPermissionProfileType",
"type": "string"
}
},
"required": [
"type"
],
"title": "RestrictedReadOnlyAccess",
"title": "DisabledPermissionProfile",
"type": "object"
},
{
"description": "Filesystem isolation is enforced by an external caller.",
"properties": {
"network": {
"$ref": "#/definitions/PermissionProfileNetworkPermissions"
},
"type": {
"enum": [
"external"
],
"title": "ExternalPermissionProfileType",
"type": "string"
}
},
"required": [
"network",
"type"
],
"title": "ExternalPermissionProfile",
"type": "object"
}
]
},
"PermissionProfileFileSystemPermissions": {
"oneOf": [
{
"properties": {
"entries": {
"items": {
"$ref": "#/definitions/FileSystemSandboxEntry"
},
"type": "array"
},
"globScanMaxDepth": {
"format": "uint",
"minimum": 1.0,
"type": [
"integer",
"null"
]
},
"type": {
"enum": [
"restricted"
],
"title": "RestrictedPermissionProfileFileSystemPermissionsType",
"type": "string"
}
},
"required": [
"entries",
"type"
],
"title": "RestrictedPermissionProfileFileSystemPermissions",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"fullAccess"
"unrestricted"
],
"title": "FullAccessReadOnlyAccessType",
"title": "UnrestrictedPermissionProfileFileSystemPermissionsType",
"type": "string"
}
},
"required": [
"type"
],
"title": "FullAccessReadOnlyAccess",
"title": "UnrestrictedPermissionProfileFileSystemPermissions",
"type": "object"
}
]
},
"PermissionProfileNetworkPermissions": {
"properties": {
"enabled": {
"type": "boolean"
}
},
"required": [
"enabled"
],
"type": "object"
},
"SandboxPolicy": {
"oneOf": [
{
@@ -101,16 +379,6 @@
},
{
"properties": {
"access": {
"allOf": [
{
"$ref": "#/definitions/ReadOnlyAccess"
}
],
"default": {
"type": "fullAccess"
}
},
"networkAccess": {
"default": false,
"type": "boolean"
@@ -167,16 +435,6 @@
"default": false,
"type": "boolean"
},
"readOnlyAccess": {
"allOf": [
{
"$ref": "#/definitions/ReadOnlyAccess"
}
],
"default": {
"type": "fullAccess"
}
},
"type": {
"enum": [
"workspaceWrite"
@@ -263,7 +521,7 @@
"type": "null"
}
],
"description": "Optional sandbox policy for this command.\n\nUses the same shape as thread/turn execution sandbox configuration and defaults to the user's configured policy when omitted."
"description": "Optional sandbox policy for this command.\n\nUses the same shape as thread/turn execution sandbox configuration and defaults to the user's configured policy when omitted. Cannot be combined with `permissionProfile`."
},
"size": {
"anyOf": [

View File

@@ -97,9 +97,10 @@
"type": "object"
},
"ApprovalsReviewer": {
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `auto_review` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request. The legacy value `guardian_subagent` is accepted for compatibility.",
"enum": [
"user",
"auto_review",
"guardian_subagent"
],
"type": "string"

View File

@@ -2,9 +2,10 @@
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"ApprovalsReviewer": {
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `auto_review` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request. The legacy value `guardian_subagent` is accepted for compatibility.",
"enum": [
"user",
"auto_review",
"guardian_subagent"
],
"type": "string"
@@ -110,6 +111,161 @@
},
"type": "object"
},
"ConfiguredHookHandler": {
"oneOf": [
{
"properties": {
"async": {
"type": "boolean"
},
"command": {
"type": "string"
},
"statusMessage": {
"type": [
"string",
"null"
]
},
"timeoutSec": {
"format": "uint64",
"minimum": 0.0,
"type": [
"integer",
"null"
]
},
"type": {
"enum": [
"command"
],
"title": "CommandConfiguredHookHandlerType",
"type": "string"
}
},
"required": [
"async",
"command",
"type"
],
"title": "CommandConfiguredHookHandler",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"prompt"
],
"title": "PromptConfiguredHookHandlerType",
"type": "string"
}
},
"required": [
"type"
],
"title": "PromptConfiguredHookHandler",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"agent"
],
"title": "AgentConfiguredHookHandlerType",
"type": "string"
}
},
"required": [
"type"
],
"title": "AgentConfiguredHookHandler",
"type": "object"
}
]
},
"ConfiguredHookMatcherGroup": {
"properties": {
"hooks": {
"items": {
"$ref": "#/definitions/ConfiguredHookHandler"
},
"type": "array"
},
"matcher": {
"type": [
"string",
"null"
]
}
},
"required": [
"hooks"
],
"type": "object"
},
"ManagedHooksRequirements": {
"properties": {
"PermissionRequest": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"PostToolUse": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"PreToolUse": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"SessionStart": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"Stop": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"UserPromptSubmit": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"managedDir": {
"type": [
"string",
"null"
]
},
"windowsManagedDir": {
"type": [
"string",
"null"
]
}
},
"required": [
"PermissionRequest",
"PostToolUse",
"PreToolUse",
"SessionStart",
"Stop",
"UserPromptSubmit"
],
"type": "object"
},
"NetworkDomainPermission": {
"enum": [
"allow",

View File

@@ -9,6 +9,7 @@
"contextWindowExceeded",
"usageLimitExceeded",
"serverOverloaded",
"cyberPolicy",
"internalServerError",
"unauthorized",
"badRequest",

View File

@@ -1,6 +1,17 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"CommandMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"ExternalAgentConfigMigrationItem": {
"properties": {
"cwd": {
@@ -39,22 +50,81 @@
"CONFIG",
"SKILLS",
"PLUGINS",
"MCP_SERVER_CONFIG"
"MCP_SERVER_CONFIG",
"SUBAGENTS",
"HOOKS",
"COMMANDS",
"SESSIONS"
],
"type": "string"
},
"HookMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"McpServerMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"MigrationDetails": {
"properties": {
"commands": {
"default": [],
"items": {
"$ref": "#/definitions/CommandMigration"
},
"type": "array"
},
"hooks": {
"default": [],
"items": {
"$ref": "#/definitions/HookMigration"
},
"type": "array"
},
"mcpServers": {
"default": [],
"items": {
"$ref": "#/definitions/McpServerMigration"
},
"type": "array"
},
"plugins": {
"default": [],
"items": {
"$ref": "#/definitions/PluginsMigration"
},
"type": "array"
},
"sessions": {
"default": [],
"items": {
"$ref": "#/definitions/SessionMigration"
},
"type": "array"
},
"subagents": {
"default": [],
"items": {
"$ref": "#/definitions/SubagentMigration"
},
"type": "array"
}
},
"required": [
"plugins"
],
"type": "object"
},
"PluginsMigration": {
@@ -74,6 +144,38 @@
"pluginNames"
],
"type": "object"
},
"SessionMigration": {
"properties": {
"cwd": {
"type": "string"
},
"path": {
"type": "string"
},
"title": {
"type": [
"string",
"null"
]
}
},
"required": [
"cwd",
"path"
],
"type": "object"
},
"SubagentMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
}
},
"properties": {

View File

@@ -1,6 +1,17 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"CommandMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"ExternalAgentConfigMigrationItem": {
"properties": {
"cwd": {
@@ -39,22 +50,81 @@
"CONFIG",
"SKILLS",
"PLUGINS",
"MCP_SERVER_CONFIG"
"MCP_SERVER_CONFIG",
"SUBAGENTS",
"HOOKS",
"COMMANDS",
"SESSIONS"
],
"type": "string"
},
"HookMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"McpServerMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"MigrationDetails": {
"properties": {
"commands": {
"default": [],
"items": {
"$ref": "#/definitions/CommandMigration"
},
"type": "array"
},
"hooks": {
"default": [],
"items": {
"$ref": "#/definitions/HookMigration"
},
"type": "array"
},
"mcpServers": {
"default": [],
"items": {
"$ref": "#/definitions/McpServerMigration"
},
"type": "array"
},
"plugins": {
"default": [],
"items": {
"$ref": "#/definitions/PluginsMigration"
},
"type": "array"
},
"sessions": {
"default": [],
"items": {
"$ref": "#/definitions/SessionMigration"
},
"type": "array"
},
"subagents": {
"default": [],
"items": {
"$ref": "#/definitions/SubagentMigration"
},
"type": "array"
}
},
"required": [
"plugins"
],
"type": "object"
},
"PluginsMigration": {
@@ -74,6 +144,38 @@
"pluginNames"
],
"type": "object"
},
"SessionMigration": {
"properties": {
"cwd": {
"type": "string"
},
"path": {
"type": "string"
},
"title": {
"type": [
"string",
"null"
]
}
},
"required": [
"cwd",
"path"
],
"type": "object"
},
"SubagentMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
}
},
"properties": {

View File

@@ -1,5 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "Deprecated legacy notification for `apply_patch` textual output.\n\nThe server no longer emits this notification.",
"properties": {
"delta": {
"type": "string"

View File

@@ -42,6 +42,22 @@
],
"title": "ChatgptAccount",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"amazonBedrock"
],
"title": "AmazonBedrockAccountType",
"type": "string"
}
},
"required": [
"type"
],
"title": "AmazonBedrockAccount",
"type": "object"
}
]
},

View File

@@ -0,0 +1,19 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"message": {
"description": "Concise guardian warning message for the user.",
"type": "string"
},
"threadId": {
"description": "Thread target for the guardian warning.",
"type": "string"
}
},
"required": [
"message",
"threadId"
],
"title": "GuardianWarningNotification",
"type": "object"
}

View File

@@ -160,6 +160,8 @@
"project",
"mdm",
"sessionFlags",
"plugin",
"cloudRequirements",
"legacyManagedConfigFile",
"legacyManagedConfigMdm",
"unknown"

Some files were not shown because too many files have changed in this diff Show More