## Why
Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.
## What Changed
- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
## Why
Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.
To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.
This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.
## What Changed
- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`
## Notes
- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
## Why
App-server request handling had a lot of repeated JSON-RPC error
construction and one-off `send_error`/`return` branches. This made small
handlers noisy and pushed error response details into leaf code that
otherwise only needed to validate input or call the underlying API.
## What Changed
- Added shared JSON-RPC error constructors in
`codex-rs/app-server/src/error_code.rs`.
- Lifted straightforward request result emission into
`codex-rs/app-server/src/message_processor.rs` so response/error
dispatch happens at the request boundary.
- Reused the result helpers across command exec, config, filesystem,
device-key, external-agent config, fs-watch, and outgoing-message paths.
- Removed leaf wrapper handlers where the method body was only
forwarding to a response helper.
- Returned request validation errors upward in the simple cases instead
of sending an error locally and immediately returning.
## Verification
- `cargo test -p codex-app-server --lib command_exec::tests`
- `cargo test -p codex-app-server --lib outgoing_message::tests`
- `cargo test -p codex-app-server --lib in_process::tests`
- `cargo test -p codex-app-server --test all v2::fs`
- `cargo test -p codex-app-server --test all v2::config_rpc`
- `cargo test -p codex-app-server --test all v2::external_agent_config`
- `cargo test -p codex-app-server --test all v2::initialize`
- `just fix -p codex-app-server`
- `git diff --check`
Note: full `cargo test -p codex-app-server` was attempted and stopped in
`message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
with a stack overflow after unrelated tests had already passed.
## Why
After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.
## What Changed
- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
## Why
This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.
`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.
The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.
## What Changed
- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
## Summary
This PR lets programmatic AgentIdentity users provide one token through
either stdin login or environment auth.
`codex login --with-agent-identity` reads an Agent Identity JWT from
stdin, validates that it has the required claims, and stores that token
as the `agent_identity` value in `auth.json`. The file format is
token-only; the decoded account and key fields are runtime state, not
hand-authored auth.json fields.
The Agent Identity JWT claim shape and decoder live in
`codex-agent-identity`; `codex-login` only owns env/storage precedence
and conversion into `CodexAuth::AgentIdentity`.
When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
both env vars are set.
Reference old stack: https://github.com/openai/codex/pull/17387/changes
Reference JWT/env stack: https://github.com/openai/codex/pull/18176
## Stack
1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. This PR: accept AgentIdentity JWTs through login/env
## Testing
Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
formatter/linter cleanup, and CI.
---------
Co-authored-by: Shijie Rao <shijie.rao@openai.com>
## Why
Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.
A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.
## What Changed
- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.
## Verification
- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
This removes the hidden `codex responses` CLI subcommand after
confirming no downstream callers rely on it, deleting the raw Responses
passthrough implementation, unregistering the subcommand, and dropping
the now-unused CLI dependencies on `codex-api` and
`codex-model-provider`.
Some providers of Responses API forward a model-defined `end_turn`
boolean indicating explicitly the model's indication of whether it would
like to end the turn or to be inferenced again. In this PR, we update
the sampling loop to use this field correctly if it's set. If the field
is not set by the provider, we fall back to the existing sampling logic.
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.
## Why
Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.
This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.
## Stack
- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support
## What Changed
- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.
## Runtime Bounds
Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.
## Validation
- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.
## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
## Why
Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.
That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.
This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.
## Failure Modes Covered
- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.
## What Changed
- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.
## Behavior
This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.
## Verification
Targeted local checks run while extracting this fix:
- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`
Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.
## Docs
No docs update is needed because this is test-only infrastructure
stabilization.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
## Why
`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.
This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.
## Security impact
Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.
## What changed
- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.
## Verification
- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
## Why
Fixes#19499.
The slash-command popup recalculated the command-name column from only
the rows visible in the current viewport. That made the description
column shift horizontally while scrolling through `/` commands whenever
longer command names entered or left the visible window.
## What Changed
`codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
selection-popup `AutoAllRows` column-width mode for both height
measurement and rendering. This keeps the command description column
based on the full filtered slash-command list instead of the current
viewport.
## Verification
- `cargo test -p codex-tui bottom_pane::command_popup`
Follow-up to #19266.
## Why
`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:
```diff
{
+ ".tmp",
"config.toml",
"installation_id",
"memories",
"skills",
}
```
That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).
That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.
## What changed
- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced
## Verification
- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
Fixes#15219.
## Why
`thread/resume` should continue a persisted thread with the same model
provider that created the thread. The app server already restores the
persisted model and reasoning effort before resuming, but it was leaving
`model_provider` unset. If a user created a thread with one provider and
later switched their active profile to another provider, resumed
encrypted history could be sent to the wrong endpoint and fail with
`invalid_encrypted_content`.
The thread metadata already records the original provider, so resume
should apply it when the caller has not explicitly requested a different
model/provider/reasoning configuration.
## What changed
This updates `merge_persisted_resume_metadata` in
`app-server/src/codex_message_processor.rs` to copy
`ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
alongside the persisted model.
The existing resume metadata tests now also assert that:
- the persisted provider is restored for normal resume
- explicit model, provider, or reasoning-effort overrides still prevent
persisted resume metadata from being applied
- a thread with no persisted model or reasoning effort still resumes
with its persisted provider
## Verification
- `cargo test -p codex-app-server` passed the app-server unit tests,
including the updated resume metadata coverage. The broader integration
portion of that command failed in an unrelated environment-sensitive
skills-budget warning assertion, where this run saw 8 omitted skills
instead of the expected 7.
- `just fix -p codex-app-server` completed successfully.
Unfortunately, if most of the build graph is invalidated such that there
are few cache hits, the Windows Bazel build for all the tests often
takes more than `30` minutes, so this PR increases the timeout to `45`
minutes until we set up distributed builds.
## Why
The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
but several files still made reviewers read private support machinery
before the public or crate-facing entry points. This ordering pass makes
each file easier to scan: exported API first, crate-visible MCP
internals next, then private helpers in breadth-first order from the
higher-level MCP flows to leaf utilities.
## What Changed
- Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
auth, and helper surfaces are grouped by visibility and reader
importance.
- Moved public and crate-visible MCP items ahead of private helpers in
the auth, MCP planning/snapshot, connection manager, and tool-name
modules.
- Kept the change mechanical, with no behavior changes intended.
## Verification
- `cargo check -p codex-mcp`
## Why
`codex-mcp` currently exposes more API than the rest of the workspace
uses. Some of that surface is simply visibility that can be tightened,
and some of it is public helper code that remains compiler-valid because
it is exported even though no workspace caller uses it.
That distinction matters: Rust does not warn on exported API just
because the current workspace does not call it. This PR intentionally
treats those exported-but-workspace-unreferenced paths as stale
`codex-mcp` surface. The main example is MCP skill dependency
collection, where the active implementation now lives in
`codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
`codex-mcp` copy makes it unclear which implementation owns skill MCP
installation.
## What Changed
- Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
- Removed non-runtime helper methods from `McpConnectionManager` so it
stays focused on live MCP clients.
- Made `ToolPluginProvenance` lookup methods crate-private.
- Removed workspace-unreferenced snapshot wrapper APIs and
qualified-tool grouping helpers.
- Deleted the duplicate `codex-mcp` skill dependency module and tests
now that skill MCP dependency handling is owned by `core`.
## Verification
- `cargo check -p codex-mcp`
## Summary
- Mark `unavailable_dummy_tools` as a stable feature and enable it by
default
- Update the feature registry test to match the new default state
## Testing
- `just fmt`
- `cargo test -p codex-features`
## Why
Issue #19418 points out a small grammar issue in `codex-rs/README.md`
under "Code Organization." The current sentence says "we hope this to
be," which reads awkwardly.
Fixes#19418.
## What changed
Updated the `core/` crate description so the sentence reads "we hope
this becomes a library crate."
## Verification
Documentation-only change. Reviewed the Markdown diff.
## Why
Recent `main` CI repeatedly timed out in:
- `codex-core::all suite::approvals::approval_matrix_covers_all_modes`
It failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905823212](https://github.com/openai/codex/actions/runs/24905823212),
[24903439629](https://github.com/openai/codex/actions/runs/24903439629),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
The failure pattern was a 60s Linux remote timeout. Logs showed many
approval scenarios completing before the single matrix test timed out.
## Root Cause
`approval_matrix_covers_all_modes` packed every approval/sandbox/tool
scenario into one test case. That made the test vulnerable to normal CI
variance: one slow scenario or a slow process startup could push the
whole monolithic case past the 60s per-test timeout. It also hid which
part of the matrix was slow because the runner only reported the one
large matrix test.
## What Changed
- Keep the shared `scenarios()` table as the single source of approval
matrix coverage.
- Use one `#[test_case]` per `ScenarioGroup` to generate five async
Tokio tests: danger/full-access, read-only, workspace-write,
apply-patch, and unified-exec.
- Keep the group runner small and add per-scenario error context so a
failure still reports the specific scenario name.
## Why This Should Be Reliable
Each scenario group now has its own test harness timeout instead of
sharing one timeout window with the full matrix. That removes the long
sequential loop from a single test while keeping the implementation
compact and easy to scan.
The tests still run through the same scenario definitions and runner, so
this preserves coverage. `test-case` already composes with
`#[tokio::test]` in this crate and is already available for test code.
## Verification
- `cargo test -p codex-core --test all approval_matrix_ -- --list`
- `cargo test -p codex-core --test all approval_matrix_`
Adds the TUI user experience for goals on top of the core runtime from
PR 4.
## Why
Users need a direct TUI control surface for long-running goals. The UI
should make the current goal visible, support common goal actions
without waiting for a model turn, and avoid confusing end-of-turn
notifications while an active goal is immediately continuing.
## What changed
- Added `/goal` summary rendering for the current goal, including
active, paused, budget-limited, and complete states.
- Added `/goal <objective>` creation/replacement through the app-server
goal API rather than a model prompt.
- Added `/goal clear`, `/goal pause`, and `/goal unpause` command
variants.
- Added a confirmation menu when the user enters a new goal while
another goal already exists.
- Updated `/goal` help and summary tip text so it reflects the supported
command variants without advertising slash-command token budgets.
- Added footer/statusline goal indicators, including elapsed time and
token budget display when a budget exists from API/tool-created goals.
- Consumes goal updated/cleared notifications so the TUI stays in sync
with external app-server changes.
- Suppresses end-of-turn desktop notifications only when a goal is still
active and follow-up work is expected.
- Preserves slash-command history behavior and avoids leaking queued
`/goal` state into unrelated submissions.
## Verification
- Added TUI unit and snapshot coverage for goal command availability,
summary rendering, control commands, replacement menu behavior,
status/footer display, notification handling, and command history.
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.
## Why
A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.
## What changed
- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.
## Verification
- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
Adds the model-facing goal tools on top of the app-server API from PR 2.
## Why
Once goals are persisted and exposed to clients, the model needs a
small, constrained tool surface for goal workflows. The tool contract
should let the model inspect goals, create them only when explicitly
requested, and mark them complete without giving it broad control over
user/runtime-owned state.
## What changed
- Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
the `goals` feature flag.
- Added core goal tool handlers that validate objectives and token
budgets before mutating persisted state.
- Constrained `create_goal` to create only when no goal exists, with
optional `token_budget` only when a budget is explicitly provided.
- Tightened the `create_goal` instructions so the model does not infer
goals from ordinary task requests.
- Constrained `update_goal` to expose only goal completion; pause,
resume, clear, and budget-limited transitions remain user- or
runtime-controlled.
- Registered the goal tools in the tool registry and kept them out of
review contexts where they should not appear.
## Verification
- Added tool-registry coverage for feature gating and tool availability.
- Added core session tests for create/get/update behavior, duplicate
goal rejection, budget validation, and completion-only updates.
Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.
## Why
Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.
## What changed
- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.
## Verification
- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.
## Why
Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.
## What changed
- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.
## Verification
- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
## Summary
Fix a Bazel-only path resolution bug in
`codex_utils_cargo_bin::cargo_bin`.
Under Bazel runfiles, `rlocation` can return a relative `bazel-out/...`
path even though `cargo_bin()` documents that it returns an absolute
path. That can break callers that store the returned binary path and
later spawn it after changing cwd, because the relative path is resolved
from the wrong directory.
This patch absolutizes the runfiles-resolved path before returning it.
## Summary
- update Codex issue automation to pin `openai/codex-action` to
`5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02`, the commit for `v1.7`
- keep the release intent visible with `# v1.7` comments beside the hash
pins
## Test plan
- `git diff --check`
- `yq e '.' .github/workflows/issue-labeler.yml`
- `yq e '.' .github/workflows/issue-deduplicator.yml`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
`FullAccess` meant the historical read-only/workspace-write modes could
read the full filesystem, while `Restricted` tried to carry partial
readable roots. The partial-read model now belongs in
`FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
`SandboxPolicy` makes every legacy projection reintroduce lossy
read-root bookkeeping and creates unnecessary noise in the rest of the
permissions migration.
This PR makes the legacy policy model narrower and explicit:
`SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
the old full-read sandbox modes only. Split readable roots, deny-read
globs, and platform-default/minimal read behavior stay in the runtime
permissions model.
## What changed
- Removes `ReadOnlyAccess` from
`codex_protocol::protocol::SandboxPolicy`, including the generated
`access` and `readOnlyAccess` API fields.
- Updates legacy policy/profile conversions so restricted filesystem
reads are represented only by `FileSystemSandboxPolicy` /
`PermissionProfile` entries.
- Keeps app-server v2 compatible with legacy `fullAccess` read-access
payloads by accepting and ignoring that no-op shape, while rejecting
legacy `restricted` read-access payloads instead of silently widening
them to full-read legacy policies.
- Carries Windows sandbox platform-default read behavior with an
explicit override flag instead of depending on
`ReadOnlyAccess::Restricted`.
- Refreshes generated app-server schema/types and updates tests/docs for
the simplified legacy policy shape.
## Verification
- `cargo check -p codex-app-server-protocol --tests`
- `cargo check -p codex-windows-sandbox --tests`
- `cargo test -p codex-app-server-protocol sandbox_policy_`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19449
## Why
When using the Amazon Bedrock provider with `openai.gpt-5.4-cmb`, the
model picker allowed `xhigh` because the CMB catalog entry was derived
from the bundled `gpt-5.4` reasoning metadata. Bedrock rejects that
effort level, causing the request to fail before the turn can run:
```text
{"error":{"code":"validation_error","message":"Failed to deserialize the JSON body into the target type: Invalid 'reasoning': Invalid 'effort': unknown variant `xhigh`, expected one of `high`, `low`, `medium`, `minimal` at line 1 column 77239","param":null,"type":"invalid_request_error"}}
```
## What Changed
- Replace the runtime lookup of bundled `gpt-5.4` metadata for
`openai.gpt-5.4-cmb` with an explicit Bedrock CMB `ModelInfo` entry.
- Advertise only the Bedrock-supported CMB reasoning levels: `minimal`,
`low`, `medium`, and `high`.
- Keep the existing GPT OSS Bedrock model metadata and reasoning levels
unchanged.
- Add catalog coverage for the hardcoded CMB metadata and
Bedrock-compatible reasoning level list.
## Why
This prepares feedback log capture for a future remote app-server hook
sink without changing the current local SQLite upload path. The
important boundary is now intentionally small: a log sink is a tracing
`Layer` that can also flush entries it has accepted.
That keeps the existing SQLite implementation simple while giving the
upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
different worker/write semantics, so this PR avoids introducing a shared
buffered-sink abstraction and instead lets each `LogWriter` own the
buffering mechanics it needs.
## What Changed
- Added `LogSinkQueueConfig` with the existing local defaults: queue
capacity `512`, batch size `128`, and flush interval `2s`.
- Added `LogDbLayer::start_with_config(...)` while preserving
`LogDbLayer::start(...)` and `log_db::start(...)` defaults.
- Introduced the `LogWriter` trait as the minimal shared interface:
`tracing_subscriber::Layer` plus `flush()`.
- Made `LogDbLayer` implement `LogWriter`.
- Kept tracing event formatting inside `LogDbLayer`; it still creates
one `LogEntry` per tracing event before queueing it for SQLite.
- Kept normal event capture best-effort and non-blocking via bounded
`try_send`.
## Behavior Notes
This does not change the SQLite schema, retention behavior,
`/feedback/upload`, or Sentry upload behavior. Normal log events still
drop when the queue is full; explicit `flush()` still waits for queue
capacity and receiver processing before returning.
## Verification
- `cargo test -p codex-state log_db`
- `cargo test -p codex-state`
- `just fix -p codex-state`
The added tests cover configured batch-size flushing, configured
interval flushing, queue-full drops, and the flush barrier semantics.
## Summary
- include the outer tool `call_id` in Codex Apps MCP request metadata
under `_meta._codex_apps.call_id`
- preserve existing Codex Apps metadata like `resource_uri` and
`contains_mcp_source`
- add request metadata coverage for both the existing-metadata and
no-existing-metadata cases
## Why
The paired backend change in
[openai/openai#850796](https://github.com/openai/openai/pull/850796)
updates MCP compliance logging to prefer `_meta._codex_apps.call_id`
instead of the JSON-RPC request id. This client change sends that outer
tool call id so the backend can record the model/tool call identifier
when it is available.
This is wire-compatible with older backends because `_meta._codex_apps`
is already reserved backend-only metadata. Backends that do not read
`call_id` will ignore the extra field.
## Testing
- `cargo test -p codex-core request_meta`
- `just fmt`
- `just fix -p codex-core`
- Add an integration test that guarantees nothing gets written to codex
home dir or sqlite when running a rollout with a non-local ThreadStore
- Add an in-memory "spy" ThreadStore for tests like this
Note I could not find a good way to also ensure there were no filesystem
_reads_ that didn't go through threadstore. I explored a more elaborate
sandboxed-subprocess approach but it isn't platform portable and felt
like it wasn't (yet) worth it.
## Summary
- Mirrors the OpenAI Docs skill cleanup in the bundled Codex skill copy
- Clarifies reasoning-effort recommendation wording
- Replaces internal snake_case prompt block names with natural-language
guidance aligned to the prompting guide
## Test plan
- `git diff --check`
- Verified the old snake_case prompt block names no longer appear in the
bundled upgrade guide
## Why
The VS Code extension and desktop app do not need the full TUI binary,
and `codex-app-server` is materially smaller than standalone `codex`. We
still want to publish it as an official release artifact, but building
it by tacking another `--bin` onto the existing release `cargo build`
invocations would lengthen those jobs.
This change keeps `codex-app-server` on its own release bundle so it can
build in parallel with the existing `codex` and helper bundles.
## What changed
- Made `.github/workflows/rust-release.yml` bundle-aware so each macOS
and Linux MUSL target now builds either the existing `primary` bundle
(`codex` and `codex-responses-api-proxy`) or a standalone `app-server`
bundle (`codex-app-server`).
- Preserved the historical artifact names for the primary macOS/Linux
bundles so `scripts/stage_npm_packages.py` and
`codex-cli/scripts/install_native_deps.py` continue to find release
assets under the paths they already expect, while giving the new
app-server artifacts distinct names.
- Added a matching `app-server` bundle to
`.github/workflows/rust-release-windows.yml`, and updated the final
Windows packaging job to download, sign, stage, and archive
`codex-app-server.exe` alongside the existing release binaries.
- Generalized the shared signing actions in
`.github/actions/linux-code-sign/action.yml`,
`.github/actions/macos-code-sign/action.yml`, and
`.github/actions/windows-code-sign/action.yml` so each workflow row
declares its binaries once and reuses that list for build, signing, and
staging.
- Added `codex-app-server` to `.github/dotslash-config.json` so releases
also publish a generated DotSlash manifest for the standalone app-server
binary.
- Kept the macOS DMG focused on the existing `primary` bundle;
`codex-app-server` ships as the regular standalone archives and DotSlash
manifest.
## Verification
- Parsed the modified workflow and action YAML files locally with
`python3` + `yaml.safe_load(...)`.
- Parsed `.github/dotslash-config.json` locally with `python3` +
`json.loads(...)`.
- Reviewed the resulting release matrices, artifact names, and packaging
paths to confirm that `codex-app-server` is built separately on macOS,
Linux MUSL, and Windows, while the existing npm staging and Windows
`codex` zip bundling contracts remain intact.
## Summary
- Mirrors openai/skills#374 in the Codex bundled OpenAI Docs skill
- Adds `gpt-image-2` as the best image generation/edit model
- Updates `gpt-image-1.5` to less expensive image generation/edit
quality
## Test plan
- `git diff --check`
## Why
We already prefer shipping the MUSL Linux builds, and the in-repo
release consumers resolve Linux release assets through the MUSL targets.
Keeping the GNU release jobs around adds release time and extra assets
without serving the paths we actually publish and consume.
This is also easier to reason about as a standalone change: future work
can point back to this PR as the intentional decision to stop publishing
`x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu` release
artifacts.
## What changed
- Removed the `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu`
entries from the `build` matrix in `.github/workflows/rust-release.yml`.
- Added a short comment in that matrix documenting that Linux release
artifacts intentionally ship MUSL-linked binaries.
## Verification
- Reviewed `.github/workflows/rust-release.yml` to confirm that the
release workflow now only builds Linux release artifacts for
`x86_64-unknown-linux-musl` and `aarch64-unknown-linux-musl`.
- Route cold thread/resume and thread/fork source loading through
ThreadStore reads instead of direct rollout path operations
- Keep lookups that explicitly specify a rollout-path using the local
thread store methods but return an invalid-request error for remote
ThreadStore configurations
- Add some additional unit tests for code path coverage
## Why
The profile conversion path still required a `cwd` even when it was only
translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
made profile producers invent an ambient `cwd`, which is exactly the
anchoring we are trying to remove from permission-profile data. A legacy
workspace-write policy can be represented symbolically instead: `:cwd =
write` plus read-only `:project_roots` metadata subpaths.
This PR creates that cwd-free base so the rest of the stack can stop
threading cwd through profile construction. Callers that actually need a
concrete runtime filesystem policy for a specific cwd still have an
explicitly named cwd-bound conversion.
## What Changed
- `PermissionProfile::from_legacy_sandbox_policy` now takes only
`&SandboxPolicy`.
- `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
symbolic, cwd-free projection for profiles.
- The old concrete projection is retained as
`FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
runtime/boundary code that must materialize legacy cwd behavior.
- Workspace-write profiles preserve `CurrentWorkingDirectory` and
`ProjectRoots` special entries instead of materializing cwd into
absolute paths.
## Verification
- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-exec -p
codex-exec-server -p codex-tui -p codex-sandboxing -p
codex-linux-sandbox -p codex-analytics --tests`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
codex-sandboxing -p codex-linux-sandbox -p codex-analytics`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19414
Selection menus in the TUI currently let disabled rows interfere with
numbering and default focus. This makes mixed menus harder to read and
can land selection on rows that are not actionable. This change updates
the shared selection-menu behavior in list_selection_view so disabled
rows are not selected when these views open, and prevents them from
being numbered like selectable rows.
- Disabled rows no longer receive numeric labels
- Digit shortcuts map to enabled rows only
- Default selection moves to the first enabled row in mixed menus
- Updated affected snapshot
- Added snapshot coverage for a plugin detail error popup
- Added a focused unit test for shared selection-view behavior
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Switch Unix socket app-server connections to perform the standard
WebSocket HTTP Upgrade handshake
- Update the Unix socket test to exercise a real upgrade over the Unix
stream
- Refresh the app-server README to describe the new Unix socket behavior
## Testing
- `cargo test -p codex-app-server transport::unix_socket_tests`
- `just fmt`
- `git diff --check`
## Why
`thread/fork` responses intentionally include copied history so the
caller can render the fork immediately, but `thread/started` is a
lifecycle notification. The v2 `Thread` contract says notifications
should return `turns: []`, and the fork path was reusing the response
thread directly, causing copied turns to be emitted through
`thread/started` as well.
## What Changed
- Route app-server `thread/started` notification construction through a
helper that clears `thread.turns` before sending.
- Keep `thread/fork` responses unchanged so callers still receive copied
history.
- Add persistent and ephemeral fork coverage that asserts
`thread/started` emits an empty `turns` array while the response retains
fork history.
## Testing
- `just fmt`
- `cargo test -p codex-app-server`
## Why
`openai.gpt-5.4-cmb` is served through the Amazon Bedrock provider,
whose request validator currently accepts `function` and `mcp` tool
specs but rejects Responses `custom` tools. The CMB catalog entry reuses
the bundled `gpt-5.4` metadata, which marks `apply_patch_tool_type` as
`freeform`. That causes Codex to include an `apply_patch` tool with
`type: "custom"`, so even heavily disabled sessions can fail before the
model runs with:
```text
Invalid tools: unknown variant `custom`, expected `function` or `mcp`
```
This is provider-specific: the model should still expose `apply_patch`,
but for Bedrock it needs to use the JSON/function tool shape instead of
the freeform/custom shape.
## What Changed
- Override the `openai.gpt-5.4-cmb` static catalog entry to set
`apply_patch_tool_type` to `function` after inheriting the rest of the
`gpt-5.4` model metadata.
- Update the catalog test expectation so the CMB entry continues to
track `gpt-5.4` metadata except for this Bedrock-specific tool shape
override.
## Verification
- `cargo test -p codex-model-provider`
- `just fix -p codex-model-provider`