Commit Graph

5179 Commits

Author SHA1 Message Date
Felipe Coury
5591912f0b fix(tui): reflow scrollback on terminal resize (#18575)
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.

## Why

Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.

This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.

## Stack

- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support

## What Changed

- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.

## Runtime Bounds

Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.

## Validation

- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
2026-04-25 22:00:32 -03:00
Shijie Rao
4e30281a13 Guard npm update readiness (#19389)
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.

## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
2026-04-25 17:09:29 -07:00
Michael Bolin
9881dc7306 fix: restore 30-minute timeout for Bazel builds (#19609)
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
2026-04-25 16:34:06 -07:00
Michael Bolin
d54493ba1c test: stabilize app-server path assertions on Windows (#19604)
## Why

Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.

That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.

This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.

## Failure Modes Covered

- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.

## What Changed

- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.

## Behavior

This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.

## Verification

Targeted local checks run while extracting this fix:

- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`

Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.

## Docs

No docs update is needed because this is test-only infrastructure
stabilization.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
2026-04-25 16:25:28 -07:00
viyatb-oai
9aaa5d9358 [codex] Bypass managed network for escalated exec (#19595)
## Why

`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.

This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.

## Security impact

Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.

## What changed

- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.

## Verification

- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
2026-04-25 23:23:58 +00:00
Eric Traut
0c785598b3 Keep slash command popup columns stable while scrolling (#19511)
## Why

Fixes #19499.

The slash-command popup recalculated the command-name column from only
the rows visible in the current viewport. That made the description
column shift horizontally while scrolling through `/` commands whenever
longer command names entered or left the visible window.

## What Changed

`codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
selection-popup `AutoAllRows` column-width mode for both height
measurement and rendering. This keeps the command description column
based on the full filtered slash-command list instead of the current
viewport.

## Verification

- `cargo test -p codex-tui bottom_pane::command_popup`
2026-04-25 14:25:58 -07:00
Michael Bolin
f41306b4f3 test: isolate remote thread store regression from plugin warmups (#19593)
Follow-up to #19266.

## Why


`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:

```diff
 {
+    ".tmp",
     "config.toml",
     "installation_id",
     "memories",
     "skills",
 }
```

That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).

That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.

## What changed

- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced

## Verification

- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
2026-04-25 20:45:31 +00:00
Eric Traut
bce74c70ce Restore persisted model provider on thread resume (#19287)
Fixes #15219.

## Why

`thread/resume` should continue a persisted thread with the same model
provider that created the thread. The app server already restores the
persisted model and reasoning effort before resuming, but it was leaving
`model_provider` unset. If a user created a thread with one provider and
later switched their active profile to another provider, resumed
encrypted history could be sent to the wrong endpoint and fail with
`invalid_encrypted_content`.

The thread metadata already records the original provider, so resume
should apply it when the caller has not explicitly requested a different
model/provider/reasoning configuration.

## What changed

This updates `merge_persisted_resume_metadata` in
`app-server/src/codex_message_processor.rs` to copy
`ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
alongside the persisted model.

The existing resume metadata tests now also assert that:

- the persisted provider is restored for normal resume
- explicit model, provider, or reasoning-effort overrides still prevent
persisted resume metadata from being applied
- a thread with no persisted model or reasoning effort still resumes
with its persisted provider

## Verification

- `cargo test -p codex-app-server` passed the app-server unit tests,
including the updated resume metadata coverage. The broader integration
portion of that command failed in an unrelated environment-sensitive
skills-budget warning assertion, where this run saw 8 omitted skills
instead of the expected 7.
- `just fix -p codex-app-server` completed successfully.
2026-04-25 12:40:00 -07:00
Ahmed Ibrahim
022f81df1f [codex] Order codex-mcp items by visibility (#19526)
## Why

The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
but several files still made reviewers read private support machinery
before the public or crate-facing entry points. This ordering pass makes
each file easier to scan: exported API first, crate-visible MCP
internals next, then private helpers in breadth-first order from the
higher-level MCP flows to leaf utilities.

## What Changed

- Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
auth, and helper surfaces are grouped by visibility and reader
importance.
- Moved public and crate-visible MCP items ahead of private helpers in
the auth, MCP planning/snapshot, connection manager, and tool-name
modules.
- Kept the change mechanical, with no behavior changes intended.

## Verification

- `cargo check -p codex-mcp`
2026-04-25 07:17:30 -07:00
Ahmed Ibrahim
706490ab1b [codex] Prune unused codex-mcp API and duplicate helpers (#19524)
## Why

`codex-mcp` currently exposes more API than the rest of the workspace
uses. Some of that surface is simply visibility that can be tightened,
and some of it is public helper code that remains compiler-valid because
it is exported even though no workspace caller uses it.

That distinction matters: Rust does not warn on exported API just
because the current workspace does not call it. This PR intentionally
treats those exported-but-workspace-unreferenced paths as stale
`codex-mcp` surface. The main example is MCP skill dependency
collection, where the active implementation now lives in
`codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
`codex-mcp` copy makes it unclear which implementation owns skill MCP
installation.

## What Changed

- Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
- Removed non-runtime helper methods from `McpConnectionManager` so it
stays focused on live MCP clients.
- Made `ToolPluginProvenance` lookup methods crate-private.
- Removed workspace-unreferenced snapshot wrapper APIs and
qualified-tool grouping helpers.
- Deleted the duplicate `codex-mcp` skill dependency module and tests
now that skill MCP dependency handling is owned by `core`.

## Verification

- `cargo check -p codex-mcp`
2026-04-25 06:36:07 -07:00
Matthew Zeng
6e838a19fa Enable unavailable dummy tools by default (#19459)
## Summary
- Mark `unavailable_dummy_tools` as a stable feature and enable it by
default
- Update the feature registry test to match the new default state

## Testing
- `just fmt`
- `cargo test -p codex-features`
2026-04-25 08:46:57 +00:00
Eric Traut
a2db6f97fb Fix codex-rs README grammar (#19514)
## Why

Issue #19418 points out a small grammar issue in `codex-rs/README.md`
under "Code Organization." The current sentence says "we hope this to
be," which reads awkwardly.

Fixes #19418.

## What changed

Updated the `core/` crate description so the sentence reads "we hope
this becomes a library crate."

## Verification

Documentation-only change. Reviewed the Markdown diff.
2026-04-24 23:31:47 -07:00
Dylan Hurd
f5497f4d65 Split approval matrix test groups (#19454)
## Why

Recent `main` CI repeatedly timed out in:

- `codex-core::all suite::approvals::approval_matrix_covers_all_modes`

It failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905823212](https://github.com/openai/codex/actions/runs/24905823212),
[24903439629](https://github.com/openai/codex/actions/runs/24903439629),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).

The failure pattern was a 60s Linux remote timeout. Logs showed many
approval scenarios completing before the single matrix test timed out.

## Root Cause

`approval_matrix_covers_all_modes` packed every approval/sandbox/tool
scenario into one test case. That made the test vulnerable to normal CI
variance: one slow scenario or a slow process startup could push the
whole monolithic case past the 60s per-test timeout. It also hid which
part of the matrix was slow because the runner only reported the one
large matrix test.

## What Changed

- Keep the shared `scenarios()` table as the single source of approval
matrix coverage.
- Use one `#[test_case]` per `ScenarioGroup` to generate five async
Tokio tests: danger/full-access, read-only, workspace-write,
apply-patch, and unified-exec.
- Keep the group runner small and add per-scenario error context so a
failure still reports the specific scenario name.

## Why This Should Be Reliable

Each scenario group now has its own test harness timeout instead of
sharing one timeout window with the full matrix. That removes the long
sequential loop from a single test while keeping the implementation
compact and easy to scan.

The tests still run through the same scenario definitions and runner, so
this preserves coverage. `test-case` already composes with
`#[tokio::test]` in this crate and is already available for test code.

## Verification

- `cargo test -p codex-core --test all approval_matrix_ -- --list`
- `cargo test -p codex-core --test all approval_matrix_`
2026-04-24 21:38:27 -07:00
Eric Traut
f1c963d77e Add goal TUI UX (5 / 5) (#18077)
Adds the TUI user experience for goals on top of the core runtime from
PR 4.

## Why

Users need a direct TUI control surface for long-running goals. The UI
should make the current goal visible, support common goal actions
without waiting for a model turn, and avoid confusing end-of-turn
notifications while an active goal is immediately continuing.

## What changed

- Added `/goal` summary rendering for the current goal, including
active, paused, budget-limited, and complete states.
- Added `/goal <objective>` creation/replacement through the app-server
goal API rather than a model prompt.
- Added `/goal clear`, `/goal pause`, and `/goal unpause` command
variants.
- Added a confirmation menu when the user enters a new goal while
another goal already exists.
- Updated `/goal` help and summary tip text so it reflects the supported
command variants without advertising slash-command token budgets.
- Added footer/statusline goal indicators, including elapsed time and
token budget display when a budget exists from API/tool-created goals.
- Consumes goal updated/cleared notifications so the TUI stays in sync
with external app-server changes.
- Suppresses end-of-turn desktop notifications only when a goal is still
active and follow-up work is expected.
- Preserves slash-command history behavior and avoids leaking queued
`/goal` state into unrelated submissions.

## Verification

- Added TUI unit and snapshot coverage for goal command availability,
summary rendering, control commands, replacement menu behavior,
status/footer display, notification handling, and command history.
2026-04-24 21:16:45 -07:00
Eric Traut
4167628622 Add goal core runtime (4 / 5) (#18076)
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.

## Why

A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.

## What changed

- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.

## Verification

- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
2026-04-24 21:16:00 -07:00
Eric Traut
32ace07ac5 Add goal model tools (3 / 5) (#18075)
Adds the model-facing goal tools on top of the app-server API from PR 2.

## Why

Once goals are persisted and exposed to clients, the model needs a
small, constrained tool surface for goal workflows. The tool contract
should let the model inspect goals, create them only when explicitly
requested, and mark them complete without giving it broad control over
user/runtime-owned state.

## What changed

- Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
the `goals` feature flag.
- Added core goal tool handlers that validate objectives and token
budgets before mutating persisted state.
- Constrained `create_goal` to create only when no goal exists, with
optional `token_budget` only when a budget is explicitly provided.
- Tightened the `create_goal` instructions so the model does not infer
goals from ordinary task requests.
- Constrained `update_goal` to expose only goal completion; pause,
resume, clear, and budget-limited transitions remain user- or
runtime-controlled.
- Registered the goal tools in the tool registry and kept them out of
review contexts where they should not appear.

## Verification

- Added tool-registry coverage for feature gating and tool availability.
- Added core session tests for create/get/update behavior, duplicate
goal rejection, budget validation, and completion-only updates.
2026-04-24 20:54:40 -07:00
Eric Traut
6c874f9b34 Add goal app-server API (2 / 5) (#18074)
Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.

## Why

Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.

## What changed

- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.

## Verification

- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.
2026-04-24 20:53:41 -07:00
Eric Traut
0ee737cea6 Add goal persistence foundation (1 / 5) (#18073)
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.

## Why

Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.

## What changed

- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.

## Verification

- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
2026-04-24 20:51:38 -07:00
Curtis 'Fjord' Hawthorne
8a559e7938 Remove js_repl feature (#19410) 2026-04-24 17:49:29 -07:00
Curtis 'Fjord' Hawthorne
cf02e9c052 Fix Bazel cargo_bin runfiles paths (#19468)
## Summary

Fix a Bazel-only path resolution bug in
`codex_utils_cargo_bin::cargo_bin`.

Under Bazel runfiles, `rlocation` can return a relative `bazel-out/...`
path even though `cargo_bin()` documents that it returns an absolute
path. That can break callers that store the returned binary path and
later spawn it after changing cwd, because the relative path is resolved
from the wrong directory.

This patch absolutizes the runfiles-resolved path before returning it.
2026-04-24 17:47:31 -07:00
Michael Bolin
789f387982 permissions: remove legacy read-only access modes (#19449)
## Why

`ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
`FullAccess` meant the historical read-only/workspace-write modes could
read the full filesystem, while `Restricted` tried to carry partial
readable roots. The partial-read model now belongs in
`FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
`SandboxPolicy` makes every legacy projection reintroduce lossy
read-root bookkeeping and creates unnecessary noise in the rest of the
permissions migration.

This PR makes the legacy policy model narrower and explicit:
`SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
the old full-read sandbox modes only. Split readable roots, deny-read
globs, and platform-default/minimal read behavior stay in the runtime
permissions model.

## What changed

- Removes `ReadOnlyAccess` from
`codex_protocol::protocol::SandboxPolicy`, including the generated
`access` and `readOnlyAccess` API fields.
- Updates legacy policy/profile conversions so restricted filesystem
reads are represented only by `FileSystemSandboxPolicy` /
`PermissionProfile` entries.
- Keeps app-server v2 compatible with legacy `fullAccess` read-access
payloads by accepting and ignoring that no-op shape, while rejecting
legacy `restricted` read-access payloads instead of silently widening
them to full-read legacy policies.
- Carries Windows sandbox platform-default read behavior with an
explicit override flag instead of depending on
`ReadOnlyAccess::Restricted`.
- Refreshes generated app-server schema/types and updates tests/docs for
the simplified legacy policy shape.

## Verification

- `cargo check -p codex-app-server-protocol --tests`
- `cargo check -p codex-windows-sandbox --tests`
- `cargo test -p codex-app-server-protocol sandbox_policy_`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19449
2026-04-24 17:16:58 -07:00
Celia Chen
d19de6d150 fix: Bedrock GPT-5.4 reasoning levels (#19461)
## Why

When using the Amazon Bedrock provider with `openai.gpt-5.4-cmb`, the
model picker allowed `xhigh` because the CMB catalog entry was derived
from the bundled `gpt-5.4` reasoning metadata. Bedrock rejects that
effort level, causing the request to fail before the turn can run:

```text
{"error":{"code":"validation_error","message":"Failed to deserialize the JSON body into the target type: Invalid 'reasoning': Invalid 'effort': unknown variant `xhigh`, expected one of `high`, `low`, `medium`, `minimal` at line 1 column 77239","param":null,"type":"invalid_request_error"}}
```

## What Changed

- Replace the runtime lookup of bundled `gpt-5.4` metadata for
`openai.gpt-5.4-cmb` with an explicit Bedrock CMB `ModelInfo` entry.
- Advertise only the Bedrock-supported CMB reasoning levels: `minimal`,
`low`, `medium`, and `high`.
- Keep the existing GPT OSS Bedrock model metadata and reasoning levels
unchanged.
- Add catalog coverage for the hardcoded CMB metadata and
Bedrock-compatible reasoning level list.
2026-04-25 00:05:22 +00:00
Rasmus Rygaard
5378cccd8a Refactor log DB into LogWriter interface (#19234)
## Why

This prepares feedback log capture for a future remote app-server hook
sink without changing the current local SQLite upload path. The
important boundary is now intentionally small: a log sink is a tracing
`Layer` that can also flush entries it has accepted.

That keeps the existing SQLite implementation simple while giving the
upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
different worker/write semantics, so this PR avoids introducing a shared
buffered-sink abstraction and instead lets each `LogWriter` own the
buffering mechanics it needs.

## What Changed

- Added `LogSinkQueueConfig` with the existing local defaults: queue
capacity `512`, batch size `128`, and flush interval `2s`.
- Added `LogDbLayer::start_with_config(...)` while preserving
`LogDbLayer::start(...)` and `log_db::start(...)` defaults.
- Introduced the `LogWriter` trait as the minimal shared interface:
`tracing_subscriber::Layer` plus `flush()`.
- Made `LogDbLayer` implement `LogWriter`.
- Kept tracing event formatting inside `LogDbLayer`; it still creates
one `LogEntry` per tracing event before queueing it for SQLite.
- Kept normal event capture best-effort and non-blocking via bounded
`try_send`.

## Behavior Notes

This does not change the SQLite schema, retention behavior,
`/feedback/upload`, or Sentry upload behavior. Normal log events still
drop when the queue is full; explicit `flush()` still waits for queue
capacity and receiver processing before returning.

## Verification

- `cargo test -p codex-state log_db`
- `cargo test -p codex-state`
- `just fix -p codex-state`

The added tests cover configured batch-size flushing, configured
interval flushing, queue-full drops, and the flush barrier semantics.
2026-04-24 16:27:39 -07:00
Dylan Hurd
32aad7bd13 Serialize legacy Windows PowerShell sandbox tests (#19453)
## Why

Recent `main` CI had repeated Windows timeouts in the legacy sandbox
process tests:

- `codex-windows-sandbox
session::tests::legacy_capture_powershell_emits_output` failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905411571](https://github.com/openai/codex/actions/runs/24905411571),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `legacy_tty_powershell_emits_output_and_accepts_input` failed in the
same set of runs.
- `legacy_non_tty_cmd_emits_output` failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24903336028](https://github.com/openai/codex/actions/runs/24903336028).
- `legacy_non_tty_powershell_emits_output` failed in runs
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24903336028](https://github.com/openai/codex/actions/runs/24903336028).

These failures were 30s timeouts on Windows x64 and/or arm64 rather than
assertion failures.

## Root Cause

The active legacy Windows sandbox process tests all exercise host-level
resources: sandbox setup, ACL/user state, private desktop process
launch, stdio capture, and PowerShell/cmd child cleanup. Running several
of these tests concurrently can leave them competing for the same
Windows sandbox setup path and process/session resources, which makes
command startup or output collection hang under CI load.

## What Changed

- Added a shared in-process mutex for the active legacy Windows sandbox
process tests.
- Held that guard across each legacy cmd/PowerShell process test so
those host-resource-heavy cases run one at a time.
- Kept the skipped legacy cmd TTY tests unchanged.

## Why This Should Be Reliable

The tests still use unique homes and run the real legacy sandbox process
path, but they no longer overlap the fragile host-level setup and
process/session lifecycle. Serializing just this small group removes the
concurrency race without reducing the behavioral coverage of each test.

## Verification

- `cargo test -p codex-windows-sandbox`
- GitHub Windows CI is the primary validation signal for the affected
tests; on this PR, Windows clippy, Windows release, and Windows local
Bazel passed after the serialization fix.
2026-04-24 16:18:30 -07:00
rreichel3-oai
219c65dc2f [codex] Forward Codex Apps tool call IDs to backend metadata (#19207)
## Summary
- include the outer tool `call_id` in Codex Apps MCP request metadata
under `_meta._codex_apps.call_id`
- preserve existing Codex Apps metadata like `resource_uri` and
`contains_mcp_source`
- add request metadata coverage for both the existing-metadata and
no-existing-metadata cases

## Why
The paired backend change in
[openai/openai#850796](https://github.com/openai/openai/pull/850796)
updates MCP compliance logging to prefer `_meta._codex_apps.call_id`
instead of the JSON-RPC request id. This client change sends that outer
tool call id so the backend can record the model/tool call identifier
when it is available.

This is wire-compatible with older backends because `_meta._codex_apps`
is already reserved backend-only metadata. Backends that do not read
`call_id` will ignore the extra field.

## Testing
- `cargo test -p codex-core request_meta`
- `just fmt`
- `just fix -p codex-core`
2026-04-24 18:49:34 -04:00
xl-openai
1e560f33e1 feat: Compress skill paths with root aliases (#19098)
Add skill root tracking so model-visible skill lists can use short path
aliases when absolute paths would exceed the metadata budget.
2026-04-24 15:49:07 -07:00
Tom
588f7a9fc4 [codex] add non-local thread store regression harness (#19266)
- Add an integration test that guarantees nothing gets written to codex
home dir or sqlite when running a rollout with a non-local ThreadStore
- Add an in-memory "spy" ThreadStore for tests like this

Note I could not find a good way to also ensure there were no filesystem
_reads_ that didn't go through threadstore. I explored a more elaborate
sandboxed-subprocess approach but it isn't platform portable and felt
like it wasn't (yet) worth it.
2026-04-24 15:45:44 -07:00
Konstantine Kahadze
3c6e2638ac Clarify bundled OpenAI Docs upgrade guide wording (#19422)
## Summary
- Mirrors the OpenAI Docs skill cleanup in the bundled Codex skill copy
- Clarifies reasoning-effort recommendation wording
- Replaces internal snake_case prompt block names with natural-language
guidance aligned to the prompting guide

## Test plan
- `git diff --check`
- Verified the old snake_case prompt block names no longer appear in the
bundled upgrade guide
2026-04-24 22:35:52 +00:00
Ahmed Ibrahim
6de6eaa0c1 [4/4] Honor Streamable HTTP MCP placement (#18584) 2026-04-24 15:03:55 -07:00
Konstantine Kahadze
c43e2fcfbf Add gpt-image-2 to bundled OpenAI Docs skill (#19443)
## Summary
- Mirrors openai/skills#374 in the Codex bundled OpenAI Docs skill
- Adds `gpt-image-2` as the best image generation/edit model
- Updates `gpt-image-1.5` to less expensive image generation/edit
quality

## Test plan
- `git diff --check`
2026-04-24 21:48:45 +00:00
Tom
0a9b559c0b Migrate fork and resume reads to thread store (#18900)
- Route cold thread/resume and thread/fork source loading through
ThreadStore reads instead of direct rollout path operations
- Keep lookups that explicitly specify a rollout-path using the local
thread store methods but return an invalid-request error for remote
ThreadStore configurations
- Add some additional unit tests for code path coverage
2026-04-24 13:51:37 -07:00
Michael Bolin
13e0ec1614 permissions: make legacy profile conversion cwd-free (#19414)
## Why

The profile conversion path still required a `cwd` even when it was only
translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
made profile producers invent an ambient `cwd`, which is exactly the
anchoring we are trying to remove from permission-profile data. A legacy
workspace-write policy can be represented symbolically instead: `:cwd =
write` plus read-only `:project_roots` metadata subpaths.

This PR creates that cwd-free base so the rest of the stack can stop
threading cwd through profile construction. Callers that actually need a
concrete runtime filesystem policy for a specific cwd still have an
explicitly named cwd-bound conversion.

## What Changed

- `PermissionProfile::from_legacy_sandbox_policy` now takes only
`&SandboxPolicy`.
- `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
symbolic, cwd-free projection for profiles.
- The old concrete projection is retained as
`FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
runtime/boundary code that must materialize legacy cwd behavior.
- Workspace-write profiles preserve `CurrentWorkingDirectory` and
`ProjectRoots` special entries instead of materializing cwd into
absolute paths.

## Verification

- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-exec -p
codex-exec-server -p codex-tui -p codex-sandboxing -p
codex-linux-sandbox -p codex-analytics --tests`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
codex-sandboxing -p codex-linux-sandbox -p codex-analytics`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19414
2026-04-24 13:42:05 -07:00
canvrno-oai
7262c0c450 Skip disabled rows in selection menu numbering and default focus (#19170)
Selection menus in the TUI currently let disabled rows interfere with
numbering and default focus. This makes mixed menus harder to read and
can land selection on rows that are not actionable. This change updates
the shared selection-menu behavior in list_selection_view so disabled
rows are not selected when these views open, and prevents them from
being numbered like selectable rows.

- Disabled rows no longer receive numeric labels
- Digit shortcuts map to enabled rows only
- Default selection moves to the first enabled row in mixed menus
- Updated affected snapshot
- Added snapshot coverage for a plugin detail error popup
- Added a focused unit test for shared selection-view behavior

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-24 13:21:43 -07:00
willwang-openai
687c5d9081 Update unix socket transport to use WebSocket upgrade (#19244)
## Summary
- Switch Unix socket app-server connections to perform the standard
WebSocket HTTP Upgrade handshake
- Update the Unix socket test to exercise a real upgrade over the Unix
stream
- Refresh the app-server README to describe the new Unix socket behavior

## Testing
- `cargo test -p codex-app-server transport::unix_socket_tests`
- `just fmt`
- `git diff --check`
2026-04-24 13:06:51 -07:00
Ruslan Nigmatullin
a3cccbd8ed [codex] Omit fork turns from thread started notifications (#19093)
## Why

`thread/fork` responses intentionally include copied history so the
caller can render the fork immediately, but `thread/started` is a
lifecycle notification. The v2 `Thread` contract says notifications
should return `turns: []`, and the fork path was reusing the response
thread directly, causing copied turns to be emitted through
`thread/started` as well.

## What Changed

- Route app-server `thread/started` notification construction through a
helper that clears `thread.turns` before sending.
- Keep `thread/fork` responses unchanged so callers still receive copied
history.
- Add persistent and ephemeral fork coverage that asserts
`thread/started` emits an empty `turns` array while the response retains
fork history.

## Testing

- `just fmt`
- `cargo test -p codex-app-server`
2026-04-24 12:31:13 -07:00
Celia Chen
0db6811b7c Fix: use function apply_patch tool for Bedrock model (#19416)
## Why

`openai.gpt-5.4-cmb` is served through the Amazon Bedrock provider,
whose request validator currently accepts `function` and `mcp` tool
specs but rejects Responses `custom` tools. The CMB catalog entry reuses
the bundled `gpt-5.4` metadata, which marks `apply_patch_tool_type` as
`freeform`. That causes Codex to include an `apply_patch` tool with
`type: "custom"`, so even heavily disabled sessions can fail before the
model runs with:

```text
Invalid tools: unknown variant `custom`, expected `function` or `mcp`
```

This is provider-specific: the model should still expose `apply_patch`,
but for Bedrock it needs to use the JSON/function tool shape instead of
the freeform/custom shape.

## What Changed

- Override the `openai.gpt-5.4-cmb` static catalog entry to set
`apply_patch_tool_type` to `function` after inheriting the rest of the
`gpt-5.4` model metadata.
- Update the catalog test expectation so the CMB entry continues to
track `gpt-5.4` metadata except for this Bedrock-specific tool shape
override.

## Verification

- `cargo test -p codex-model-provider`
- `just fix -p codex-model-provider`
2026-04-24 18:45:09 +00:00
mcgrew-oai
dee5f5ea38 Harden package-manager install policy (#19163)
## Summary

This PR hardens package-manager usage across the repo to reduce
dependency supply-chain risk. It also removes the stale `codex-cli`
Docker path, which was already broken on `main`, instead of keeping a
bitrotted container workflow alive.

## What changed

- Updated pnpm package manager pins and workspace install settings.
- Removed stale `codex-cli` Docker assets instead of trying to keep a
broken local container path alive.
- Added uv settings and lockfiles for the Python SDK packages.
- Updated Python SDK setup docs to use `uv sync`.

## Why

This is primarily a security hardening change. It reduces
package-install and supply-chain risk by ensuring dependency installs go
through pinned package managers, committed lockfiles, release-age
settings, and reviewed build-script controls.

For `codex-cli`, the right follow-up was to remove the local Docker path
rather than keep patching it:

- `codex-cli/Dockerfile` installed `codex.tgz` with `npm install -g`,
which bypassed the repo lockfile and age-gated pnpm settings.
- The local `codex-cli/scripts/build_container.sh` helper was already
broken on `main`: it called `pnpm run build`, but
`codex-cli/package.json` does not define a `build` script.
- The container path itself had bitrotted enough that keeping it would
require extra packaging-specific behavior that was not otherwise needed
by the repo.

## Gaps addressed

- Global npm installs bypassed the repo lockfile in Docker and CLI
reinstall paths, including `codex-cli/Dockerfile` and
`codex-cli/bin/codex.js`.
- CI and Docker pnpm installs used `--frozen-lockfile`, but the repo was
missing stricter pnpm workspace settings for dependency build scripts.
- Python SDK projects had `pyproject.toml` metadata but no committed
`uv.lock` coverage or uv age/index settings in `sdk/python` and
`sdk/python-runtime`.
- The secure devcontainer install path used npm/global install behavior
without a local locked package-manager boundary.
- The local `codex-cli` Docker helper was already broken on `main`, so
this PR removes that stale Docker path instead of preserving a broken
surface.
- pnpm was already pinned, but not to the current repo-wide pnpm version
target.

## Verification

- `pnpm install --frozen-lockfile`
- `.devcontainer/codex-install`: `pnpm install --prod --frozen-lockfile`
- `.devcontainer/codex-install`: `./node_modules/.bin/codex --version`
- `sdk/python`: `uv lock --check`, `uv sync --locked --all-extras
--dry-run`, `uv build`
- `sdk/python-runtime`: `uv lock --check`, `uv sync --locked --dry-run`,
`uv build --wheel`
- `pnpm -r --filter ./sdk/typescript run build`
- `pnpm -r --filter ./sdk/typescript run lint`
- `pnpm -r --filter ./sdk/typescript run test`
- `node --check codex-cli/bin/codex.js`
- `docker build -f .devcontainer/Dockerfile.secure -t codex-secure-test
.`
- `cargo build -p codex-cli`
- repo-wide package-manager audit
2026-04-24 14:36:19 -04:00
Konstantine Kahadze
6bb2fa3fd4 Update bundled OpenAI Docs skill for GPT-5.5 (#19407)
## Summary
Updates the bundled OpenAI Docs system skill for GPT-5.5.

## Changes
- Updates the bundled latest-model fallback
- Replaces bundled upgrade guidance with GPT-5.5 migration guidance
- Replaces bundled prompting guidance with GPT-5.5 prompting guidance

## Test plan
- Ran `node scripts/resolve-latest-model-info.js`
- Verified bundled files match the OpenAI Docs skill fallback content
2026-04-24 18:26:47 +00:00
iceweasel-oai
e787358f70 check PID of named pipe consumer (#19283)
## Why
The elevated Windows command runner currently trusts the first process
that connects to its parent-created named pipes. Tightening the pipe ACL
already narrows who can reach that boundary, but verifying the connected
client PID gives the parent one more fail-closed check: it only accepts
the exact runner process it just spawned.

## What changed
- validate `GetNamedPipeClientProcessId` after `ConnectNamedPipe` and
reject clients whose PID does not match the spawned runner
- also did some code de-duplication to route the one-shot elevated
capture flow in `windows-sandbox-rs/src/elevated_impl.rs` through
`spawn_runner_transport()` so both elevated codepaths use the same pipe
bootstrap and PID validation

Using the transport unification here also reduces duplication in the
elevated Windows IPC bootstrap, so future hardening to the runner
handshake only needs to land in one place.

## Validation
- `cargo test -p codex-windows-sandbox`
- manual testing: one-shot elevated path via `target/debug/codex.exe
exec` running a randomized shell command and confirming captured output
- manual testing: elevated session path via `target/debug/codex.exe -c
'windows.sandbox="elevated"' sandbox windows -- python -u -c ...` with
stdin/stdout round-trips (`READY`, then `GOT:...` for two input lines)

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-04-24 17:41:08 +00:00
Alex Zamoshchin
bcc1caa920 respect workspace option for disabling plugins (#18907)
Respects the workspace setting for plugins in Codex

Plugins menu disappears
Plugins do not load
Plugins do not load in composer

no plugins loaded
<img width="809" height="226" alt="Screenshot 2026-04-23 at 3 20 45 PM"
src="https://github.com/user-attachments/assets/3a4dba8e-69c3-4046-a77e-f13ab77f84b4"
/>


no plugins in menu
<img width="293" height="204" alt="Screenshot 2026-04-23 at 3 20 35 PM"
src="https://github.com/user-attachments/assets/5cb9bf52-ad72-488f-b90c-5eb457da09a3"
/>
2026-04-24 17:38:45 +00:00
jif-oai
f802f0a391 chore: drop MCP Plugins and App from Morpheus (#19380)
Quick fix of https://github.com/openai/codex/issues/18333
2026-04-24 17:57:48 +02:00
danwang-oai
11806faf71 Fix hang on turn/interrupt (#18392)
Fix a bug where the `turn/interrupt` RPC hangs when interrupting a turn
that has already completed.

Before this change, `turn/interrupt` requests were queued in app-server
and only answered when a later TurnAborted event arrived. If the target
turn was already complete, core treated Op::Interrupt as a no-op, so no
abort event was emitted and the RPC could hang indefinitely.

This change fixes that in two places:

* Reject turn/interrupt immediately with `INVALID_REQUEST` when the
requested turn is no longer the active turn.
* Resolve any already-accepted pending interrupt requests when the turn
reaches TurnComplete, covering the case where a turn finishes naturally
after the interrupt request is accepted but before it aborts.

I tested this by adding a failing test in
707487c063. You may view the results here:
https://github.com/openai/codex/actions/runs/24585182419/

<img width="1512" height="310" alt="CleanShot 2026-04-17 at 16 33 30@2x"
src="https://github.com/user-attachments/assets/f4a88228-b2a4-41f4-9aaa-ec82814096af"
/>
2026-04-24 10:47:50 -04:00
jif-oai
28742866c7 Add agents.interrupt_message for interruption markers (#19351)
## Why

Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.

## What changed

- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.

## Testing

- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
2026-04-24 16:02:45 +02:00
jif-oai
deb4509302 feat: surface multi-agent thread limit in spawn description (#19360)
## Summary
- Thread `agent_max_threads` into `ToolsConfig` and
`SpawnAgentToolOptions`.
- Render the configured `max_concurrent_threads_per_session` value in
the MultiAgentV2 `spawn_agent` description.
- Cover the description text in `codex-tools` unit tests and
`codex-core` tool spec tests.

## Validation
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spawn_agent_description`
- `git diff --check`

## Notes
- `cargo test -p codex-core` was also attempted, but unrelated
environment-sensitive tests failed with the active local environment.
Examples: approvals reviewer defaults observed `AutoReview` instead of
`User`, request-permissions event tests did not emit events, and
proxy-env tests saw `http://127.0.0.1:50604` from the active proxy
environment.

Co-authored-by: Codex <noreply@openai.com>
2026-04-24 15:13:54 +02:00
jif-oai
9eadff9713 chore: alias max_concurrent_threads_per_session (#19354) 2026-04-24 14:33:03 +02:00
jif-oai
120aa07d81 Make MultiAgentV2 interruption markers assistant-authored (#19124)
## Why

`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.

This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.

## What changed

- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.

## Testing

- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
2026-04-24 13:39:26 +02:00
sayan-oai
c10f95ddac Update models.json and related fixtures (#19323)
Supersedes #18735.

The scheduled rust-release-prepare workflow force-pushed
`bot/update-models-json` back to the generated models.json-only diff,
which dropped the test and snapshot updates needed for CI.

This PR keeps the latest generated `models.json` from #18735 and adds
the corresponding fixture updates:
- preserve model availability NUX in the app-server model cache fixture
- update core/TUI expectations for the new `gpt-5.4` `xhigh` default
reasoning
- refresh affected TUI chatwidget snapshots for the `gpt-5.5`
default/model copy changes

Validation run locally while preparing the fix:
- `just fmt`
- `cargo test -p codex-app-server model_list`
- `cargo test -p codex-core includes_no_effort_in_request`
- `cargo test -p codex-core
includes_default_reasoning_effort_in_request_when_defined_by_model_info`
- `cargo test -p codex-tui --lib chatwidget::tests`
- `cargo insta pending-snapshots`

---------

Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
2026-04-24 11:14:13 +02:00
Eric Traut
ddfa691752 Surface reasoning tokens in exec JSON usage (#19308)
## Summary

Fixes #19022.

`codex exec --json` currently emits `turn.completed.usage` with input,
cached input, and output token counts, but drops the reasoning-token
split that Codex already receives through thread token usage updates.
Programmatic consumers that rely on the JSON stream, especially
ephemeral runs that do not write rollout files, need this field to
accurately display reasoning-model usage.

This PR adds `reasoning_output_tokens` to the public exec JSON `Usage`
payload and maps it from the existing `ThreadTokenUsageUpdated` total
token usage data.

## Verification

- Added coverage to
`event_processor_with_json_output::token_usage_update_is_emitted_on_turn_completion`
so `turn.completed.usage.reasoning_output_tokens` is asserted.
- Updated SDK expectations for `run()` and `runStreamed()` so TypeScript
consumers see the new usage field.
- Ran `cargo test -p codex-exec`.
- Ran `pnpm --filter ./sdk/typescript run build`.
- Ran `pnpm --filter ./sdk/typescript run lint`.
- Ran `pnpm --filter ./sdk/typescript exec jest --runInBand
--testTimeout=30000`.
2026-04-24 01:54:11 -07:00
Eric Traut
6f87eb0479 Hide unsupported MCP bearer_token from config schema (#19294)
## Summary

Fixes #19275.

Codex runtime rejects inline MCP `bearer_token` config entries and asks
users to configure `bearer_token_env_var` instead, but the generated
config schema still advertised `mcp_servers.<name>.bearer_token` as a
supported field. That made editor/schema validation disagree with
runtime validation.

This keeps `bearer_token` in `RawMcpServerConfig` so Codex can continue
producing the targeted runtime error for recent or existing configs, but
skips the field during schemars generation. The checked-in
`core/config.schema.json` fixture now exposes `bearer_token_env_var`
without exposing unsupported inline `bearer_token`.

## Verification

- Added `config_schema_hides_unsupported_inline_mcp_bearer_token` to
assert the generated schema hides `bearer_token` while preserving
`bearer_token_env_var`.
- Ran `cargo test -p codex-config`.
- Ran `cargo test -p codex-core config_schema`.
2026-04-24 00:17:43 -07:00
sayan-oai
e083b6c757 chore: apply truncation policy to unified_exec (#19247)
we were not respecting turn's `truncation_policy` to clamp output tokens
for `unified_exec` and `write_stdin`.

this meant truncation was only being applied by `ContextManager` before
the output was stored in-memory (so it _was_ being truncated from
model-visible context), but the full output was persisted to rollout on
disk.

now we respect that `truncation_policy` and `ContextManager`-level
truncation remains a backup.

### Tests
added tests, tested locally.
2026-04-24 00:17:39 -07:00