## Why
`multi_agents_v2` is meant to be independently gated from the older
`collab` feature. The tool registry still treated the
collaboration-style agent tools as `collab`-only, so enabling
`multi_agents_v2` without `collab` omitted the v2 agent tools. Review
and guardian sub-sessions also need to keep agent spawning disabled even
when the outer session has `multi_agents_v2` enabled.
## What changed
- Include the collab-backed agent tools when either `multi_agents_v2` or
`collab` is enabled.
- Explicitly disable `multi_agents_v2` for review and guardian review
sub-sessions, matching the existing `spawn_csv` and `collab`
restrictions.
- Add a registry test that enables `multi_agents_v2`, disables `collab`,
and verifies the v2 agent tools are present while legacy `send_input`
and `resume_agent` remain hidden.
## Testing
- Added
`test_build_specs_multi_agent_v2_does_not_require_collab_feature`.
## Why
Fixes#20145.
`config/value/write` treats a JSON `null` value as a request to clear
the config key. Clearing a key that is already absent should be
idempotent, but clearing a nested key such as `features.personality`
from an empty `config.toml` returned `configPathNotFound` because
`clear_path` treated the missing `features` parent table as an error.
That makes app-server reset flows brittle because clients have to read
first and avoid sending a clear request unless the parent path already
exists.
## What Changed
- Updated app-server config clearing so missing intermediate tables, or
non-table parents, are treated as an unchanged no-op.
- Removed the now-unreachable `MergeError::PathNotFound` path from
config write merging.
- Added a regression test covering `features.personality = null` against
an empty user config.
## Verification
- `cargo test -p codex-app-server clear_missing_nested_config_is_noop`
- `cargo test -p codex-app-server` was run; the config manager unit
suite passed, but one unrelated integration test failed because
`turn_start_emits_thread_scoped_warning_notification_for_trimmed_skills`
expected `7` trimmed skills and observed `8`.
- `just fix -p codex-app-server`
1. Adds v2 plugin/share/save, plugin/share/list, and plugin/share/delete
RPCs.
2. Implements save by archiving a local plugin root, enforcing a size
limit, uploading through the workspace upload flow, and supporting
updates via remotePluginId.
3. Lists created workspace plugins
4. Deletes a previously uploaded/shared plugin.
## Why
#20271 increased the `90`-minute timeout in `rust-release.yml`, but it
did not update the reusable Windows workflow in
`rust-release-windows.yml`. As a result, the Windows release compile
jobs were still capped at `60` minutes and the `windows-x64` primary
build could continue timing out.
We are keeping the existing `90`-minute timeout in `rust-release.yml`.
That increase was still directionally correct because the top-level
release build benefits from extra headroom; the mistake was assuming it
also covered the reusable Windows jobs.
## What Changed
- increase the reusable Windows release workflow timeouts in
`rust-release-windows.yml` from `60` minutes to `90` minutes
- update the comment in `rust-release.yml` so it no longer implies that
the top-level timeout covers the Windows reusable jobs
## Why
After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.
## What
- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.
## Stack
1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
The generated schema files account for much of the raw diff. The core
behavior is in:
- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- [x] Move the allowlist out of core crate
- [x] Add Teams, SharePoint, Outlook Email, and Outlook Calendar to the
tool_suggest discoverable plugin allowlist
- [x] Add focused coverage for Microsoft curated plugin discovery
## Testing
- just fmt
- cargo test -p codex-core-plugins
- cargo test -p codex-core
list_tool_suggest_discoverable_plugins_returns_
## Why
Reused Guardian review trunks can still have older child-turn events
queued when a later review starts. The review waiter currently accepts
the first terminal event it sees from the shared child session, so a
stale `TurnComplete` can be attributed to the new review. That produces
impossible analytics combinations such as non-null TTFT with sub-10 ms
completion latency and zero token deltas on `trunk_reused` reviews.
## What changed
- Preserve the child turn id returned by the Guardian review
`Op::UserTurn` submission.
- Restrict Guardian review waiting to events correlated with that
submitted child turn.
- Restrict timeout/abort draining to terminal events for the same child
turn.
- Add regression coverage for stale prior-turn completions, stale
prior-turn errors, and interrupt draining in
`codex-rs/core/src/guardian/review_session.rs`.
## Verification
- `cargo test -p codex-core guardian::review_session::tests::`
- `cargo clippy -p codex-core --tests -- -D warnings`
## Why
Fixes#20264.
Side conversations are an ephemeral layer on top of the main chat.
Pressing `Ctrl+D` from an empty side-chat composer should unwind back to
the parent thread, matching the existing side-return behavior, instead
of falling through to the global quit shortcut and exiting Codex.
## What changed
The side-return shortcut matcher now treats `Ctrl+D` the same way it
already treats `Esc` and `Ctrl+C`. Because app-level side-return
handling runs before the chat widget's global quit handling, this
returns from `/side` while preserving normal `Ctrl+D` quit behavior
outside side conversations.
The existing shortcut coverage was updated to include lowercase and
uppercase `Ctrl+D` key events.
## Verification
- `cargo test -p codex-tui
side_return_shortcuts_match_esc_ctrl_c_and_ctrl_d`
- `cargo test -p codex-tui` starts successfully and the new shortcut
test passes, but the broader suite later aborts in the unrelated
existing test
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`
with a stack overflow.
Summary:
- Return from external agent import before session history import
finishes
- Run session import work in the background and emit the existing
completion notification when it is done
- Serialize session imports so duplicate requests do not create
duplicate imported threads
Verification:
- cargo test -p codex-app-server external_agent_config_
- cargo test -p codex-external-agent-sessions
- just fix -p codex-app-server
- just fix -p codex-external-agent-sessions
- git diff --check
## Summary
- Support Claude Code `ai-title` / `aiTitle` records when detecting and
importing external agent sessions.
- Preserve existing `custom-title` / `customTitle` precedence; only fall
back to `aiTitle` when no custom title is present.
- Add coverage for both detection and import title selection, including
the custom-title-over-ai-title case.
## Testing
- `cargo test -p codex-external-agent-sessions`
- `just fix -p codex-external-agent-sessions`
## Why
We need a way to list the available hooks to expose via the TUI and App
so users can view and manage their hooks
## What
- Adds `hooks/list` for one or more `cwd` values that returns discovered
hook metadata
## Stack
1. openai/codex#19705
2. This PR - openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Review Notes
The generated schema files account for most of the raw diff, these files
have the core change:
- `hooks/src/engine/discovery.rs` builds the inventory entries during
hook discovery while leaving runtime handlers focused on execution.
- `app-server/src/codex_message_processor.rs` wires `hooks/list` into
the app-server flow for each requested `cwd`.
- `app-server-protocol/src/protocol/v2.rs` defines the new v2
request/response payloads exposed on the wire.
### Core Changes
`core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
`skills/list` and `hooks/list`can resolve plugin state for each
requested `cwd`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
update the local login success page to match the Codex desktop auth UX
use theme-aware colors and an inline 20px Codex mark
keep the actual localhost success page aligned with the browser auth UX
PR
## Tests
<img width="1728" height="1117" alt="Screenshot 2026-04-29 at 12 00
34 PM"
src="https://github.com/user-attachments/assets/76a40c3f-07c3-452c-97da-e7c43717cd2c"
/>
## Summary
Enforce FileSystemSandboxPolicy protected metadata names in the Linux
bubblewrap adapter so `.git`, `.agents`, and `.codex` remain read only
inside writable workspace roots unless the policy grants an explicit
write carveout.
## Scope
1. Translate protected metadata names from FileSystemSandboxPolicy into
bubblewrap masks for existing metadata paths.
2. Represent missing protected metadata paths as guarded mount targets
so agents cannot create `.git`, `.agents`, or `.codex` under writable
roots.
3. Preserve normal git discovery for existing repos, worktrees, and
parent repos.
4. Keep explicit user write grants working when policy allows a
protected metadata path directly.
## Not in scope
1. No shell preflight UX.
2. No TUI runtime profile propagation.
3. No macOS Seatbelt changes in this PR.
## Reviewer focus
1. This should be reviewed as the Linux enforcement adapter for the
policy primitive from PR 19846.
2. macOS enforcement already landed in PR 19847.
3. The important invariant is that `FileSystemSandboxPolicy` is the
source of truth for `.git`, `.agents`, and `.codex`.
## Validation
1. `git diff` whitespace check passed.
2. `cargo fmt` check passed with the existing stable rustfmt warning
about `imports_granularity`.
3. Full Linux sandbox Cargo test suite passed on the devbox.
4. Devbox forty six case suite passed at head
`012accb703c13bd28df5b40079a9bf183036336a`.
5. Devbox summary: pass 46, fail 0.
6. The devbox suite was run through `just c sandbox linux`.
7. Focused repo test for Viyat parent repo case passed on the devbox.
## Summary
- remove the Windows-specific unified-exec environment block from tool
selection
- keep `unified_exec` default-off on Windows unless the feature is
explicitly enabled
- normalize model-provided `shell_type = unified_exec` to
`shell_command` when the feature is disabled
- drop obsolete tests tied to the removed environment gate and keep the
feature-flag regression coverage
## Why
Now that the session/long-lived process backend is implemented for the
Windows sandbox, we don't need to hard disable it anymore. We will be
rolling out slowly using a feature gate.
## Impact
This allows manual Windows opt-in in CLI and app-backed flows while
preserving the existing default-off behavior for Windows users.
---------
Co-authored-by: canvrno-oai <kbond@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Summary:
- Add a checked-in codex-core public API listing generated by
cargo-public-api.
- Add scripts/regen-public-api.sh with an embedded crate list,
auto-install for cargo-public-api 0.51.0, pinned nightly, and --check
mode.
- Add Rust CI jobs on the codex Linux x64 runner pool to verify the
listing stays up to date.
Testing:
- bash -n scripts/regen-public-api.sh
- just regen-public-api --check
- yq '.' .github/workflows/rust-ci.yml
.github/workflows/rust-ci-full.yml
- git diff --check
## Summary
Persisted subagent parent/child topology currently leaks through
`StateRuntime`'s SQLite-specific thread-spawn helpers. This PR
introduces a narrow `AgentGraphStore` boundary so follow-up work can
route graph operations through a local or remote store without coupling
orchestration code directly to the state DB graph API.
## Changes
- Adds the new `codex-agent-graph-store` crate.
- Defines a flat `AgentGraphStore` trait for the v1 graph surface:
upsert edge, set edge status, list direct children, and list
descendants.
- Adds public graph types for `ThreadSpawnEdgeStatus`,
`AgentGraphStoreError`, and `AgentGraphStoreResult`.
- Implements `LocalAgentGraphStore` on top of an existing
`codex_state::StateRuntime`, preserving today's SQLite-backed
`thread_spawn_edges` behavior.
- Registers the crate in Cargo/Bazel metadata.
This PR only adds the local contract and implementation; call-site
migration and the remote gRPC store are left to the follow-up PRs in the
stack.
## Testing
- `cargo test -p codex-agent-graph-store`
The new unit tests cover local parity with the existing `StateRuntime`
graph methods, `Open`/`Closed` filtering, status updates, and stable
breadth-first descendant ordering.
Plugin MCP servers are loaded from plugin manifests rather than
top-level `[mcp_servers]`, so their tool approval preferences need to be
stored and applied through the owning plugin config. Without this,
choosing "Always allow" for a plugin MCP tool could write a preference
that was not reliably used on later tool calls.
## Summary
- Add plugin-scoped MCP policy config under
`plugins.<plugin>.mcp_servers`, including server enablement, tool
allow/deny lists, server defaults, and per-tool approval modes.
- Overlay plugin MCP policy onto manifest-provided server configs when
plugins are loaded.
- Route persistent "Always allow" writes for plugin MCP tools back to
the owning `plugins.<plugin>.mcp_servers.<server>.tools.<tool>` config
entry.
- Reload user config after persisting an approval and make the plugin
load cache config-aware so stale plugin MCP policy is not reused after
`config.toml` changes.
- Regenerate the config schema and add coverage for plugin MCP policy
loading, approval lookup, persistence, and stale-cache prevention.
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-core --lib plugin_mcp`
## Why
`x-codex-turn-metadata` is sent as an HTTP/WebSocket header, but Codex
was serializing the metadata JSON with raw UTF-8 string contents. When a
workspace path contains non-ASCII characters, common HTTP stacks can
reject or corrupt that header before the request reaches the provider.
Fixes#17468. Also addresses the duplicate WebSocket report in #19581.
## What changed
- Added `codex_utils_string::to_ascii_json_string`, a shared helper that
serializes JSON normally while escaping non-ASCII string content as
`\uXXXX`.
- Switched turn metadata header serialization, including merged
Responses API client metadata, to use the ASCII-safe JSON helper.
- Added coverage for non-ASCII workspace paths and non-ASCII client
metadata while preserving the same parsed JSON values.
## Verification
- `cargo test -p codex-utils-string`
- `cargo test -p codex-core turn_metadata`
- `just bazel-lock-check`
## Why
We have run into two avoidable problems when introducing async trait
APIs in Rust:
- `#[async_trait]` has caused materially worse build times in this
repository.
- `#[allow(async_fn_in_trait)]` makes it too easy to ship a public trait
without spelling out whether the returned future is `Send`, which hides
an important part of the trait contract.
We already have a good example of the preferred alternative in
[#16630](https://github.com/openai/codex/pull/16630) /
[`3c7f013f9735`](https://github.com/openai/codex/commit/3c7f013f9735),
but that guidance currently lives only as prior art in the codebase.
This PR documents the rule in `AGENTS.md` so contributors are more
likely to follow the native RPITIT pattern before these two shortcuts
spread further.
## What Changed
- added Rust guidance in `AGENTS.md` discouraging both `#[async_trait]`
and `#[allow(async_fn_in_trait)]`
- pointed contributors to the native RPITIT pattern with explicit `Send`
bounds on the returned future
- clarified that implementations may still use `async fn` when they
satisfy that trait contract
## Verification
- docs-only change; no tests run
Summary
- Add `[features.apps_mcp_path_override]` config with a `path` field for
overriding only the built-in apps MCP path.
- Keep existing host/base URL derivation unchanged and append the
configured path after that base.
- Regenerate the config schema with the custom feature-config case.
Test Plan
- Not run for latest revision; only `just fmt` and `just
write-config-schema` were run.
- Earlier revision: `cargo test -p codex-features`
- Earlier revision: `cargo test -p codex-mcp`
## Summary
- Keep the preferred ChatGPT login callback port `1455` first.
- Preserve the existing `/cancel` recovery for stale Codex login
servers.
- Fall back to the registered localhost callback port `1457` when `1455`
remains unavailable.
## Why
Cursor and Codex Desktop both use the ChatGPT account login callback
server. On Windows, Cursor can already be listening on `127.0.0.1:1455`
/ `[::1]:1455`, causing Codex Desktop sign-in to fail with:
`Local callback port 1455 is already in use on this machine.`
Codex already attempted to cancel a stale Codex login server on that
port, but if the listener does not release the port, the old behavior
was to fail. The new behavior falls back to `1457`, which matches the
fixed redirect URI being registered server-side in
`openai/openai#863817`. This keeps the OAuth `redirect_uri` inside
Hydra's exact allow-list instead of choosing an arbitrary ephemeral
port.
## Validation
- `just fmt`
- `cargo test -p codex-login`
- `git diff --check HEAD~1..HEAD`
## Why
The precursor PR keeps successful client responses typed until
app-server's outgoing response seam. This follow-up uses that seam to
move successful client-response analytics out of individual handlers and
into the shared sender path, while keeping filtering decisions inside
`codex-analytics`.
## What changed
- Emit successful client-response analytics centrally from
`OutgoingMessageSender::send_response`.
- Remove duplicate handler-local response tracking for the current
thread/turn lifecycle responses.
- Keep analytics ingestion selective inside `AnalyticsEventsClient`, so
unrelated client traffic is ignored before cloning or boxing.
- Collapse client-response analytics facts onto one typed path and
normalize payloads in the reducer.
- Add direct client-filter coverage plus sender-level coverage for the
centralized forwarding path.
## Verification
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests --lib`
## Summary
- Fetch remote plugin detail before sending the uninstall request.
- Use the detail response to derive the marketplace namespace and plugin
name for cache cleanup.
- Stop the uninstall before the backend POST if detail lookup fails, so
backend state and local cache state do not diverge.
## Testing
- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo test -p codex-core-plugins`
- `git diff --check`
## Why
`pr17088` adds typed server-originated request/response plumbing, but
successful client responses are still erased into bare JSON-RPC `result`
values before app-server can make any typed decision about them.
This precursor PR keeps successful client responses typed until the
outgoing response seam. It is intentionally limited to
protocol/app-server plumbing so the analytics behavior change can review
separately on top.
## What changed
- Add `ClientResponsePayload` as the pre-serialization client response
body type.
- Route app-server successful response paths through the typed payload
seam while preserving existing handler-local analytics behavior.
- Keep `InterruptConversation` JSON-RPC-only because it has no
`ClientResponse` variant.
- Move the new payload conversion tests into a dedicated protocol test
module.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server-protocol`
## Why
[#17088](https://github.com/openai/codex/pull/17088) changed
`OutgoingMessageSender::new` to require an `AnalyticsEventsClient`, but
one `command_exec` test added earlier on `main` still called the old
one-argument constructor. That leaves current `main` failing to compile
in Bazel and argument-comment-lint jobs.
## What changed
- Pass `AnalyticsEventsClient::disabled()` to the missed
`OutgoingMessageSender::new` test call site in `command_exec.rs`.
## Verification
- `cargo test -p codex-app-server
timeout_or_cancellation_reports_cancellation_without_timeout_exit_code`
## Summary
- Tighten `tool_suggest` guidance so it prefers explicit plugin install
requests, while still allowing a connector install when the relevant
plugin is already installed and a needed connector from that plugin is
missing.
- Tell the model not to call `tool_suggest` in parallel with other
tools.
## Testing
- `cargo test -p codex-tools tool_suggest`
- `cargo test -p codex-core tool_suggest`
## Why
Codex analytics needs a typed seam for app-server-originated
request/response traffic so future tool-approval analytics can consume
those facts without adding bespoke callsite tracking each time. Server
responses arrive as JSON-RPC `id + result` payloads, so analytics has to
reconstruct the matching typed response from the original typed request
while that request context still exists in app-server.
This also puts analytics on the app-server outbound path, which needs to
avoid keeping the runtime alive during shutdown. The final ownership fix
keeps the normal strong auth-manager retention in analytics and makes
the external-auth refresh bridge hold a weak back-reference to
`OutgoingMessageSender`, breaking the runtime cycle at the bridge
boundary instead of exposing retention policy through the analytics
client API.
## What changed
- Adds typed `ServerRequest` and `ServerResponse` analytics facts, plus
`AnalyticsEventsClient::track_server_request` and
`track_server_response`.
- Renames the existing client-side facts to `ClientRequest` and
`ClientResponse` so reducers can distinguish client-to-server traffic
from server-to-client traffic.
- Adds `ServerRequest::response_from_result`, allowing a stored typed
request to decode the matching typed server response from a raw JSON-RPC
result payload.
- Threads `AnalyticsEventsClient` through `OutgoingMessageSender` and
records targeted server requests, replayed targeted requests, and
matching targeted responses with the responding connection id needed for
correlation.
- Intentionally leaves broadcast server requests/responses out of
analytics for now because the current model is per connection, while
broadcasts fan one logical request out across multiple connections.
- Breaks the app-server shutdown cycle by storing
`Weak<OutgoingMessageSender>` in `ExternalAuthRefreshBridge` and
upgrading it only when an external-auth refresh is actually requested.
- Keeps reducer ingestion of the new server-side facts as no-ops for
now; this PR is plumbing for later tool-approval analytics work.
## Verification
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests::`
- Covers typed-response reconstruction plus the targeted, replayed,
broadcast-exclusion, and response-attribution analytics paths.
## Follow-up
This PR intentionally stops at ingestion plumbing, so `ServerRequest`
and `ServerResponse` facts are still reducer no-ops. Once a follow-up PR
adds real downstream analytics output for those facts:
- replace the temporary pre-reducer observation seam with reducer tests
for the emitted event shape;
- add end-to-end coverage in `app-server/tests/suite/v2/analytics.rs`
for the real app-server workflow and captured analytics payload;
- remove the temporary sender-level observer tests added here in favor
of the real-output coverage above.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17088).
* #18748
* #18747
* #17090
* #17089
* #20241
* #20239
* __->__ #17088
## Why
This bug is exposed by Guardian/auto-review approvals. With the managed
network proxy enabled, a blocked network request can be reported back
through the network approval service as an approval denial after the
command has already started. Before this change, the shell and unified
exec runtimes registered those network approval calls, but did not have
a way to observe an async proxy denial as a cancellation/failure signal
for the running process.
The result was confusing: Guardian/auto-review could correctly deny
network access, but the command path could keep running or unregister
the approval without surfacing the denial as the command failure.
## What Changed
- `NetworkApprovalService` now attaches a cancellation token to active
and deferred network approvals.
- Proxy-denial outcomes are recorded only for active registrations,
cancel the owning token, and are consumed when the approval is
finalized.
- The shell runtime combines the normal command timeout with the
network-denial cancellation token.
- Unified exec stores the deferred network approval object, terminates
tracked processes when the proxy denial arrives, and returns the denial
as a process failure while polling or completing the process.
- Tool orchestration passes the active network approval cancellation
token into the sandbox attempt and preserves deferred approval errors
instead of silently unregistering them.
- App-server `command/exec` now handles the combined
timeout-or-cancellation expiration variant used by the runtime.
## Verification
- `cargo test -p codex-core network_approval --lib`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
---------
Co-authored-by: Codex <noreply@openai.com>
- Fetches and caches remote /installed plugin state
- Lets skills/list load skills from remote-installed cached plugins
without requiring a local marketplace entry
- Routes plugin list/startup/install/uninstall changes through async
plugin cache invalidation and MCP refresh
## Summary
- include the live auto-review trunk rollout when `/feedback` uploads
logs
- upload that attachment as
`auto-review-rollout-<parent-thread-id>.jsonl` so it is distinguishable
from the parent rollout
- show the same auto-review attachment name in the TUI consent popup
## Scope
- this only covers the live cached auto-review trunk for the current
parent thread
- it does not add durable historical parent->auto-review lookup
- it does not add persisted rollout support for ephemeral parallel
review forks
## UI
<img width="599" height="185" alt="Screenshot 2026-04-28 at 1 17 18 PM"
src="https://github.com/user-attachments/assets/6a0e79c2-5d21-4702-8a89-f765778bc9e9"
/>
## Validation
- `cargo test -p codex-core
cached_guardian_subagent_exposes_its_rollout_path`
- `cargo test -p codex-feedback`
- `cargo test -p codex-app-server`
- `cargo test -p codex-tui feedback_upload_consent_popup_snapshot`
- `cargo test -p codex-tui
feedback_good_result_consent_popup_includes_connectivity_diagnostics_filename`
## Known unrelated local failures
- `cargo test -p codex-core` currently fails in the pre-existing proxy
env snapshot test
`tools::runtimes::tests::maybe_wrap_shell_lc_with_snapshot_keeps_user_proxy_env_when_proxy_inactive`
- `cargo test -p codex-tui` currently hits pre-existing `status::*`
snapshot drift unrelated to this change
## Follow-Up
- persist parallel auto-review fork sessions so /feedback can include
their rollout history too
- attach each persisted fork as its own clearly named file, for example
auto-review-rollout-<parent-thread-id>-fork <n>.jsonl, instead of
merging multiple Guardian sessions into one attachment
- keep the same live-session-only scope initially; durable historical
parent -> auto-review lookup can remain a separate decision if we later
need feedback from resumed sessions
## Summary
- add a regression test for
`InterAgentCommunication::to_response_input_item`
- assert replayed inter-agent messages keep `phase:
Some(MessagePhase::Commentary)`
## Test plan
- `cargo test -p codex-protocol`
- `just argument-comment-lint`
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.
Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check
Tests: Not run per request.
### Summary
- Path-based local thread reads currently return rollout/session git
metadata directly, so `thread/resume` can disagree with persisted SQLite
metadata for the same thread.
- Merge non-null SQLite git fields over rollout-path reads while keeping
rollout values as fallbacks for fields SQLite does not know.
- Add focused regression coverage for rollout-path reads so persisted
branch updates are preserved during resume.
### Testing
- `cargo test -p codex-thread-store`
## Why
This is part 3 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.
With `AppCommand` now explicit, the internal app event bus can carry TUI
commands directly instead of bouncing through core `Op` values.
## What changed
- Changed `AppEvent::CodexOp` and `AppEvent::SubmitThreadOp` to carry
`AppCommand`.
- Updated app-event senders and direct emitters to submit `AppCommand`
values.
- Adjusted tests to match `AppCommand` or convert back through
`into_core()` where they intentionally assert legacy payload equality.
## Verification
- `cargo test -p codex-tui --no-run`
## Why
This is part 2 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.
Before the TUI event bus can stop carrying core `Op` values,
`AppCommand` needs to be an owned TUI command shape rather than a thin
wrapper around `Op`.
## What changed
- Replaced the opaque `AppCommand(Op)` wrapper with explicit owned
variants for the commands the TUI submits.
- Preserved `into_core()` so this layer does not yet change the
app/thread submission boundary.
- Kept existing core leaf types for now so this remains a mechanical
command-shape refactor.
## Verification
- `cargo check -p codex-tui`
In some setups the summary or raw content can be dropped between
requests. This triggers a check in the reducer which expects that the
messages should remain identical between requests.
This PR relaxes the checks to only focus on the encrypted ID instead. It
also changes the reducer to keep the most rich version of the message
observed during the rollout (this ensures that we don't accidentally
lose the CoT nor summary when available).
## Summary
Some improvements to Windows process-management issues from
https://github.com/openai/codex/pull/15578
- bound the elevated runner pipe-connect handshake instead of waiting
forever on blocking pipe connects
- terminate the spawned runner if that handshake fails, so timeout/error
paths do not leave a stray `codex-command-runner.exe`
- loop on partial `WriteFile` results when forwarding stdin in the
elevated runner, so input is not silently truncated
- fix the concrete HANDLE/SID cleanup paths in the runner setup code
- keep draining driver-backed stdout/stderr after exit until the backend
closes, instead of dropping the tail after a fixed 200ms grace period
- reuse `LocalSid` for SID ownership and add more explanatory comments
around the ownership/concurrency-sensitive code paths
## Why
The original PR fixed a lot of Windows session plumbing, but there were
still a few sharp process-lifecycle edges:
- some elevated runner handshakes could block forever
- the new timeout path could still orphan the spawned runner process
- stdin forwarding still assumed a single `WriteFile` consumed the whole
buffer
- a few raw HANDLE/SID error paths still leaked
- driver-backed output could still lose the last chunk of stdout/stderr
on slower backends
## Validation
- `cargo fmt -p codex-windows-sandbox -p codex-utils-pty`
- `cargo test -p codex-utils-pty`
- `cargo test -p codex-windows-sandbox finish_driver_spawn`
- `cargo test -p codex-windows-sandbox runner_`
Ran a local test matrix of unified-exec and shell_tool tests, all
passing
## Why
This is part 1 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.
This first layer reduces the size of the later `chatwidget` diff by
mechanically moving MCP startup bookkeeping out of the central widget
file without changing the event shapes or behavior.
## What changed
- Extracted MCP startup status handling into
`tui/src/chatwidget/mcp_startup.rs`.
- Kept the existing core event types in place for this purely mechanical
move.
- Updated the MCP startup tests to import the moved test-only event
types directly.
## Verification
- `cargo test -p codex-tui chatwidget::tests::mcp_startup`
## Why
The paused goal statusline currently points users at `/goal` to unpause
a goal, but bare `/goal` is the summary command and does not change the
goal state. Instead of making `/goal` mutate state only when a goal is
paused, this gives the action an explicit command that reads naturally
in the UI.
## What Changed
- Replace `/goal unpause` with `/goal resume` for reactivating a paused
goal.
- Update the paused goal statusline and `/goal` summary copy to point at
`/goal resume`.
## Why
`agents.max_depth` is a legacy multi-agent v1 guard. Multi-agent v2 uses
task-path routing and its own session/thread limits, so v2 should not
reject nested `spawn_agent` calls just because the thread-spawn depth
has reached the v1 maximum.
Keeping the v1 depth guard active in v2 prevents deeper task trees even
though the v2 path still needs the depth value only for lineage and
task-path metadata.
## What Changed
- Removed the depth-limit rejection from the multi-agent v2
`spawn_agent` handler while still computing child depth for lineage/path
metadata.
- Made the depth-based disabling of legacy `SpawnCsv`/`Collab` tools
apply only when `Feature::MultiAgentV2` is disabled.
- Added `multi_agent_v2_spawn_agent_ignores_configured_max_depth` to
cover a v2 child spawning another agent when `agent_max_depth = 1`,
while the existing v1 depth-limit tests continue to enforce the legacy
behavior.
## Verification
- `cargo test -p codex-core
multi_agent_v2_spawn_agent_ignores_configured_max_depth -- --nocapture`
- `cargo test -p codex-core depth_limit -- --nocapture`
- `cargo test -p codex-core tools::handlers::multi_agents::tests --
--nocapture`
## Summary
Fix the Windows sandbox PTY spawn path to pass the pseudoconsole handle
value directly into `UpdateProcThreadAttribute`.
## Why
Sandboxed `unified_exec` PTY sessions on Windows were failing during
child process startup with `0xc0000142` (`STATUS_DLL_INIT_FAILED`). In
practice this showed up as PowerShell DLL init popups when the sandboxed
background-terminal path tried to launch an interactive shell.
The root cause was that we were passing a pointer to a local `isize`
variable instead of the pseudoconsole handle value in the form Windows
expects for `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE`.
## Validation
- `cargo build -p codex-windows-sandbox --bins`
- Reproduced the real sandboxed `codex exec` flow with
`windows.sandbox_private_desktop=true`
- Verified a `tty=true` interactive session launched through the normal
PowerShell wrapper, printed `READY`, accepted follow-up stdin, and
exited cleanly
- Confirmed no new `0xc0000142` / `Application Popup` events appeared
after the successful repro
## Summary
- Rewrite migrated external-agent hook commands by replacing the full
hook script path token instead of only the `.claude/hooks/` segment.
- Preserve quoting around the full rewritten target path so script names
with spaces, absolute paths, and shell operators/redirection continue to
work.
- Apply `.claude/settings.local.json` over `.claude/settings.json` for
config, MCP, and plugin migration so local scope matches Claude settings
precedence.
- Skip legacy command markdown without `description` frontmatter,
including README-style docs under `.claude/commands`.
## Root Cause
The previous hook rewrite handled `.claude/hooks/` as a substring
replacement. For absolute source commands, that left the original
project-root prefix before the newly quoted `.codex/hooks` directory,
producing invalid commands like
`project/'project/.codex/hooks'/script.sh`.
The migration also only used project `settings.json` for
config/MCP/plugin decisions, so local settings such as
`disabledMcpjsonServers` could be ignored even though Claude gives local
settings higher precedence than project settings.
## Validation
- `just fmt`
- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-app-server external_agent_config`
- `just fix -p codex-external-agent-migration`
- `just fix -p codex-app-server`
- `git diff --check`
## Why
The explicit profile path from #20117 is meant for standalone testing,
but it still inherited the
shell cwd and all managed requirements implicitly. The pre-existing
launcher path even called out
that it did not support a separate cwd yet in
[`debug_sandbox.rs`](509453f688/codex-rs/cli/src/debug_sandbox.rs (L174-L179)).
For a standalone command, the useful default is to let the caller choose
the project directory being
tested and to avoid administrator-provided constraints unless the caller
explicitly wants to test
those too.
## What changed
- Add explicit-profile-only `-C/--cd DIR`, and use that cwd for both
profile resolution and command
execution.
- Add explicit-profile-only `--include-managed-config`.
- Make explicit profile mode skip managed requirement sources by
default, including cloud
requirements, MDM requirements, `/etc/codex/requirements.toml`, and the
legacy managed-config
requirements projection.
- Preserve all existing invocations outside the explicit-profile path.
## Stack
1. #20117 `sandbox-ui-profile`
2. #20118 `sandbox-ui-config` --> this PR
Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.
## Tests ran
- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_`
- `cargo test -p codex-core
load_config_layers_can_ignore_managed_requirements`
- `cargo test -p codex-core
load_config_layers_includes_cloud_requirements`
- macOS branch-binary smoke on the rebased top of stack: `-C` changed
execution cwd, explicit
profile mode omitted managed proxy env under `env -i`, and
`--include-managed-config` restored it.
- Linux devbox branch-binary smoke on the rebased top of stack: `-C`
changed execution cwd for
built-in and user-defined explicit profiles.
Messages sent with `followup_task` already arrive at their target
recipient promptly (at message boundaries while sampling, or after the
pending tool call completes) -- having `interrupt` is not worth the
added complexity.
## Why
`codex sandbox` is useful for exercising sandbox behavior directly, but
before this stack the CLI
only picked up permission profiles indirectly from the active config.
The existing debug-sandbox path
already compiled `[permissions]` profiles through normal config loading,
as covered by the existing
profile tests in
[`debug_sandbox.rs`](de2ccf9473/codex-rs/cli/src/debug_sandbox.rs (L715-L760)).
This adds the smallest stable entry point first: an explicit profile
selector that reuses the same
config machinery as normal Codex config, so standalone testing becomes
possible without changing
current no-selector behavior.
## What changed
- Add additive `--permissions-profile NAME` support to `codex sandbox
macos|linux|windows`.
- Resolve built-in and user-defined profile names by feeding
`default_permissions` through the
existing config compilation path instead of inventing a sandbox-only
parser.
- Make an explicit selector win over an ambient active profile's legacy
`sandbox_mode`.
- Keep the existing no-selector behavior unchanged.
## Stack
1. #20117 `sandbox-ui-profile` --> this PR
2. #20118 `sandbox-ui-config`
Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.
## Tests ran
- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_parses_permissions_profile`
- `cargo test -p codex-core
cli_override_takes_precedence_over_profile_sandbox_mode`
- macOS branch-binary smoke on the rebased top of stack: built-in
`:workspace` and user-defined
profiles both executed successfully through `--permissions-profile`.
- Linux devbox branch-binary smoke on the rebased top of stack: built-in
`:workspace` and
user-defined profiles both executed successfully through
`--permissions-profile`.
## Summary
Starts the process of getting rid of `--full-auto`, with some
concessions:
1. Fully removes the command from the tui, since it just resolves to the
default permissions there, and encourages users to use the one-time
trust flow if they're not in a trusted repo.
2. Marks the command as deprecated in `codex exec`, in case users are
actively relying on this. We'll remove in an upcoming n+X release.
3. Cleans up some of the `codex sandbox` cli logic, to keep supporting
legacy sandbox policies for now.
This isn't the cleanest setup, but I think it is worthwhile to warn
users for one release before hard-removing it.
## Testing
- [x] Updated unit tests
## Summary
- Change `EnvironmentProvider` to return concrete `Environment`
instances instead of `EnvironmentConfigurations`.
- Make `DefaultEnvironmentProvider` provide the provider-visible `local`
environment plus optional `remote` environment from
`CODEX_EXEC_SERVER_URL`.
- Keep `EnvironmentManager` as the concrete cache while exposing its own
explicit local environment for `local_environment()` fallback paths.
## Validation
- `just fmt`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` is the canonical runtime permission model in the
Rust workspace, but the Linux sandbox helper still accepted a legacy
`SandboxPolicy` plus separate filesystem and network policy flags. That
translation layer made the helper interface harder to reason about and
left `linux-sandbox`-specific callers and tests coupled to the legacy
policy representation.
This change moves the helper onto `PermissionProfile` directly so the
Linux sandbox plumbing matches the rest of the permission stack.
## What changed
- changed `codex-linux-sandbox` to accept `--permission-profile` and
derive the runtime filesystem and network policies internally
- updated the in-process seccomp and legacy Landlock path in
`codex-rs/linux-sandbox` to operate on `PermissionProfile`
- updated Linux sandbox argv construction in `codex-rs/sandboxing`,
`codex-rs/core`, and the CLI debug sandbox path to pass the canonical
profile instead of serializing compatibility policy projections
- simplified the Linux sandbox tests to build the exact permission
profile under test, including the managed-proxy path and
direct-runtime-enforcement carveout coverage
- removed helper-local `SandboxPolicy` usage from `bwrap` tests where
`FileSystemSandboxPolicy` is already the value being exercised
## Testing
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox` (on this macOS host, the crate
compiled cleanly and its Linux-only tests were cfg-gated)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-cli --no-run`
## Summary
Amazon Bedrock Mantle's OpenAI-compatible endpoint now lives under
`/openai/v1`, and the GPT-5.4 Mantle model ID no longer uses the `-cmb`
suffix. This updates Codex's built-in Bedrock provider configuration so
generated providers and the static Bedrock catalog use the current
endpoint and model ID.
## Changes
- Update the Bedrock Mantle base URL from
`https://bedrock-mantle.{region}.api.aws/v1` to
`https://bedrock-mantle.{region}.api.aws/openai/v1`.
- Update the Amazon Bedrock default base URL in
`codex-model-provider-info`.
- Change the Bedrock GPT-5.4 catalog slug from `openai.gpt-5.4-cmb` to
`openai.gpt-5.4`.
- Align provider and catalog tests with the new URL and model ID.
## Test Plan
- Manual smoke test:
```shell
target/debug/codex \
-m openai.gpt-5.4 \
-c 'model_provider="amazon-bedrock"' \
-c 'model_providers.amazon-bedrock.aws.region="us-west-2"'
```
follow up of #19442. The app server now exposes provider-derived bounds
through a new v2 `modelProvider/read` method. The response reports the
configured provider map key as `modelProvider` and returns the effective
capability booleans so clients can align their UI with the same
provider-owned limits used by core.
Fixes test that often fails locally when running `cargo test`
- Add an app-server test helper that combines managed-config isolation
with custom env overrides.
- Isolate `HOME` / `USERPROFILE` in plugin-list workspace settings tests
so host home marketplaces do not affect results.
Fix for #19925
Restore the `Working` indicator after a streamed final answer finishes
when a user steer message is sent.
Add regression coverage for long output plus a mid-stream steer:
`cargo test -p codex-tui
final_answer_completion_restores_status_indicator_for_pending_steer`
Duplication/testing steps:
1. Start a new thread and ask for a long response.
2. While the response is streaming, submit a steer message.
3. When the first response finishes, observe whether `Working...` is
shown while waiting for the steer message response.
## Summary
This fixes the CI regression introduced by
[#20040](https://github.com/openai/codex/pull/20040).
That PR migrated several `apply_patch_cli` tests from direct
`codex.submit(Op::UserTurn { ... })` calls to `harness.submit(...)`.
`harness.submit()` waits for `TurnComplete` before returning, which
drains the same event stream that these tests use to assert `TurnDiff`,
`PatchApplyUpdated`, and related live events. The regressed tests then
timed out waiting for events that had already been consumed.
This change restores a no-wait submit path for the event-observing
`apply_patch_cli` tests so they can watch the turn stream directly
again.
## What Changed
- added a local `submit_without_wait(...)` helper in
`codex-rs/core/tests/suite/apply_patch_cli.rs`
- switched the `apply_patch_cli` tests that assert live turn events back
to that helper
- left the profile-backed `harness.submit(...)` migration in place for
tests that only care about final filesystem or tool output state
## Why macOS Looked Green
In the failing run
[25084487331](https://github.com/openai/codex/actions/runs/25084487331),
`//codex-rs/core:core-all-test` was cached on macOS, so the regressed
tests were not rerun there. The Linux GNU, Linux MUSL, and Windows Bazel
jobs reran the target and exposed the failure.
## Verification
- `cargo test -p codex-core apply_patch_ -- --nocapture`
- previously failing local cases now pass again:
- `apply_patch_cli_move_without_content_change_has_no_turn_diff`
- `apply_patch_turn_diff_for_rename_with_content_change`
- `apply_patch_aggregates_diff_across_multiple_tool_calls`
## Why
Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.
This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.
## What changed
- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.
## Testing
built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
## Why
This PR expands the migration path so Codex can detect and import MCP
server config, hooks, commands, and subagents configs in a Codex-native
shape.
## What changed
- Added a `codex-external-agent-migration` crate that owns conversion
logic for external-agent MCP servers, hooks, commands, and subagents.
- Extended the app-server external-agent config detection/import API
with migration item types for MCP server config, hooks, commands, and
subagents.
## Migration strategy
The migration is intentionally conservative: Codex only imports
external-agent config that can be represented safely in Codex today.
Unsupported or ambiguous config is skipped instead of being partially
translated into behavior that may not match the source system.
- **MCP servers**: import supported stdio and HTTP MCP server
definitions into `mcp_servers`. Disabled servers and servers filtered
out by source `enabledMcpjsonServers` / `disabledMcpjsonServers` are
skipped. Project-scoped MCP entries from `.claude.json` are included
when they match the repo path.
- **Hooks**: import only supported command hooks into
`.codex/hooks.json`. Unsupported hook features such as conditional
groups, async handlers, prompt/http hooks, or unknown fields are
skipped. Referenced hook scripts are copied into `.codex/hooks/`,
preserving any existing target scripts.
- **Commands**: import supported external commands as Codex skills under
`.agents/skills/source-command-*`. Commands that rely on source runtime
expansion such as `$ARGUMENTS`, `$1`, `@file` references, shell
interpolation, or colliding generated names are skipped.
- **Subagents**: import valid subagent Markdown files into
`.codex/agents/*.toml` when they have the minimum Codex agent fields.
Source model names are not migrated, so imported agents keep the user’s
Codex default model; compatible reasoning effort and sandbox mode are
migrated when present.
- **Skills and project guidance**: copy missing skill directories into
`.agents/skills` and migrate `CLAUDE.md` guidance into `AGENTS.md`,
rewriting source-agent terminology to Codex terminology where
appropriate.
- **Detection details**: detected migration items include lightweight
details for UI preview, such as MCP server names, hook event names,
generated command skill names, and subagent names. Import still
recomputes from disk instead of trusting details as the source of truth.
- Adds focused coverage for the new migration behavior and app-server
import flow.
## Verification
- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server external_agent_config`
- `just bazel-lock-check`
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression
## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
## Summary
- Removes `SandboxPolicy` from the hooks test suite.
- Submits hook-related turns with explicit `PermissionProfile` values
for disabled, read-only, and workspace-write cases.
- Preserves the managed-network hook test by configuring and submitting
a workspace-write profile with enabled network, allowing the existing
requirements-backed proxy path to remain covered.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Removes `SandboxPolicy` from the RMCP client test suite.
- Adds shared read-only user-turn helpers that submit
`PermissionProfile::read_only()` plus the legacy compatibility
projection required by the current `Op::UserTurn` shape.
- Keeps sandbox metadata assertions intact by deriving the expected
legacy `sandboxPolicy` value from the same read-only profile used for
the turn.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Removes the remaining `SandboxPolicy` usage from the compaction test
suite.
- Adds a small local helper for direct `Op::UserTurn` construction so
these tests send `PermissionProfile::Disabled` plus the legacy
compatibility projection required by the protocol field.
- Keeps the existing danger/full-access behavior while exercising the
canonical permission profile path.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Updates the zsh-fork test helper to configure `PermissionProfile`
directly instead of constructing a legacy `SandboxPolicy`.
- Sends permission-profile-backed turns from the skill approval zsh-fork
tests so the runtime and request path exercise the canonical permissions
model.
- Leaves the broader approvals suite on legacy policies for now, except
for the zsh-fork test that shares this helper.
## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
This migrates the macOS request-permissions tool tests from legacy
`SandboxPolicy` setup to `PermissionProfile` setup. The tests still
exercise the same workspace-write baseline and request-permission
grants, but the canonical permissions value is now the profile.
## Changes
- Replaces the `workspace_write_excluding_tmp()` helper with a
`PermissionProfile::workspace_write_with()` helper.
- Applies test config through `Permissions::set_permission_profile()`.
- Uses `turn_permission_fields()` for `Op::UserTurn` compatibility
fields.
- Removes the `SandboxPolicy` import from `request_permissions_tool.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This removes the explicit `SandboxPolicy` constructors from
`core/tests/suite/prompt_caching.rs`. The tests still exercise the same
prompt-cache invariants across permission and turn-context changes, but
the permission source is now `PermissionProfile`.
## Changes
- Uses `PermissionProfile::workspace_write_with()` for workspace-write
override scenarios.
- Uses `PermissionProfile::Disabled` for the no-sandbox per-turn
override.
- Projects profiles through `turn_permission_fields()` or
`to_legacy_sandbox_policy()` only to populate compatibility fields on
existing ops.
- Removes the `SandboxPolicy` import from `prompt_caching.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This migrates `core/tests/suite/exec_policy.rs` away from legacy
`SandboxPolicy` turn construction. These tests all use no-sandbox turns
to exercise exec-policy behavior, so `PermissionProfile::Disabled` is
the canonical representation.
## Changes
- Replaces direct `SandboxPolicy::DangerFullAccess` turn fields with
`PermissionProfile::Disabled`.
- Uses `turn_permission_fields()` to populate the compatibility
`sandbox_policy` field required by `Op::UserTurn`.
- Removes the `SandboxPolicy` import from `exec_policy.rs`.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This removes another test-only `SandboxPolicy` dependency by configuring
`permissions_messages.rs` with a `PermissionProfile` directly. The test
still verifies the rendered compatibility permissions text, but now
obtains the legacy projection from the loaded `Config` rather than using
`SandboxPolicy` as the source of truth.
## Changes
- Builds the workspace-write test setup with
`PermissionProfile::workspace_write_with()`.
- Applies that profile through `Permissions::set_permission_profile()`.
- Uses `Config::legacy_sandbox_policy()` only for the expected
`PermissionsInstructions` compatibility rendering.
## Verification
- `cargo check -p codex-core --tests`
## Summary
This continues the test-side migration away from `SandboxPolicy` by
removing the remaining legacy policy setup in
`core/tests/suite/tools.rs`. The affected test was already modeling a
profile-backed filesystem policy with a deny-read glob, so configuring
the test through `Permissions::set_permission_profile()` is a better
match for the behavior being exercised.
## Changes
- Drops the `SandboxPolicy` import from `core/tests/suite/tools.rs`.
- Configures the glob deny-read shell test directly with a
`PermissionProfile` instead of creating a legacy read-only policy first.
- Submits the test turn with the session permission profile so the
deny-read glob remains active for the command under test.
## Verification
- `cargo check -p codex-core --tests`
## Why
The core item tests still had a cluster of plan-mode `Op::UserTurn`
literals that used `SandboxPolicy::DangerFullAccess` and omitted
`permission_profile`. These tests are validating emitted item lifecycle
events, so keeping them on the legacy sandbox-only turn shape adds noise
to the broader permissions migration without testing legacy behavior.
## What Changed
- Adds a local `disabled_plan_turn()` helper that preserves the existing
`std::env::current_dir()` turn cwd behavior.
- Uses `turn_permission_fields(PermissionProfile::Disabled, cwd)` to
populate both the compatibility `sandbox_policy` and canonical
`permission_profile` fields.
- Replaces the plan-mode hand-built turns in
`codex-rs/core/tests/suite/items.rs`, removing all `SandboxPolicy`
references from that file and reducing remaining `codex-rs/core/tests`
`SandboxPolicy` files from 16 to 15.
## Verification
- `cargo check -p codex-core --tests`
## Why
This stack is retiring direct `SandboxPolicy` construction from tests so
core coverage exercises the same `PermissionProfile` turn path used by
runtime code. `safety_check_downgrade.rs` still submitted each test turn
as `SandboxPolicy::DangerFullAccess` with no permission profile, even
though the tests are about model verification/reroute behavior rather
than legacy sandbox conversion.
## What Changed
- Adds a local `disabled_text_turn()` helper that derives both the
compatibility `sandbox_policy` and canonical `permission_profile` from
`PermissionProfile::Disabled`.
- Replaces repeated hand-built `Op::UserTurn` literals in
`codex-rs/core/tests/suite/safety_check_downgrade.rs` with that helper.
- Removes all `SandboxPolicy` references from the safety-check suite,
reducing the remaining `codex-rs/core/tests` files that mention
`SandboxPolicy` from 17 to 16.
## Verification
- `cargo check -p codex-core --tests`
## Why
This stack is removing direct `SandboxPolicy` usage from test code so
new tests exercise the same `PermissionProfile` path that runtime code
now treats as canonical. `view_image.rs` still built `Op::UserTurn`
requests with `SandboxPolicy::DangerFullAccess` and no permission
profile, which kept another core test module on the legacy turn shape.
## What Changed
- Adds a small `disabled_user_turn()` helper for the view-image suite
that derives the compatibility `sandbox_policy` and canonical
`permission_profile` from `PermissionProfile::Disabled`.
- Replaces repeated direct `Op::UserTurn` literals in
`codex-rs/core/tests/suite/view_image.rs` with that helper.
- Removes all `SandboxPolicy` references from `view_image.rs`, reducing
the remaining `codex-rs/core/tests` files that mention `SandboxPolicy`
from 18 to 17.
## Verification
- `cargo check -p codex-core --tests`
## Summary
- Migrates `model_switching.rs` and `personality.rs` direct
`Op::UserTurn` construction from legacy `SandboxPolicy` literals to
`PermissionProfile`-backed turn fields.
- Adds small local helpers in each file so tests keep asserting
model/personality behavior without repeating permission plumbing.
- Reduces `rg -l '\bSandboxPolicy\b' codex-rs/core/tests` from 20 files
to 18; `codex-rs/tui` remains at zero `SandboxPolicy` references.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
# Why
`plugin_hook_sources_run_with_plugin_env_and_plugin_source` can still
fail on Windows after the earlier file-based assertion cleanup because
the hook process itself occasionally exceeds the old 5s timeout under CI
load. When that happens, the hook run ends as `Failed` before the test
can inspect its structured output.
The Windows Bazel failure showed the hook run itself failing after
nearly 8 seconds:
```text
---- engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source stdout ----
thread 'engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source' panicked at hooks/src\engine\mod_tests.rs:428:5:
assertion failed: `(left == right)`
Diff < left / right > :
<Failed
>Completed
...
test result: FAILED. 78 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.96s
```
# What
- raise the flaky plugin hook env test timeout from 5s to 10s so it
matches the other executed hook tests in this module
# Validation
- `cargo test -p codex-hooks`
## Summary
- Replace legacy sandbox config setup in delegate and telemetry tests
with direct `PermissionProfile` configuration.
- Move no-sandbox and read-only test turns in `tools.rs`,
`code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
leaving the deny-glob read-only compatibility case for a later targeted
cleanup.
- Use `PermissionProfile::read_only()` where tests need managed
read-only behavior and `PermissionProfile::Disabled` where they
intentionally need no sandbox.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
files after #20013 to 22 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
## Summary
- Migrate another batch of direct `Op::UserTurn` test construction from
legacy `SandboxPolicy` values to `PermissionProfile` inputs via
`turn_permission_fields()`.
- Replace a one-off read-only `SandboxPolicy` bridge in the macOS exec
test with `PermissionProfile::read_only()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 32
files at the start of the cleanup stack to 27 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p codex-core`
## Summary
- Add `turn_permission_fields()` so tests that construct `Op::UserTurn`
directly can provide a canonical `PermissionProfile` while still filling
the required legacy `sandbox_policy` compatibility field.
- Migrate direct user-turn construction in core integration tests from
`SandboxPolicy::DangerFullAccess` to `PermissionProfile::Disabled`.
- Continue reducing direct `SandboxPolicy` usage in
`codex-rs/core/tests`, from 41 files after #20010 to 32 files in this
PR.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
## Why
The `codex-issue-digest` skill was producing more detail than the daily
digest needed, and broad all-area digests could miss active issues. In
particular, issue #16088 had substantial recent comments and reactions
but did not appear in the weekly all-areas output because GitHub search
was using default relevance ranking and the collector could exhaust its
candidate cap before later search queries got a fair sample.
That made the digest look quieter than the underlying user activity and
made threshold tuning misleading.
## What changed
- Make the digest summary headline-first and summary-only by default.
- Add an explicit opt-in flow for `## Details`, so the issue table is
shown only when requested or when the prompt asks for details upfront.
- Update the collector to request GitHub issue search results with
`sort=updated` and `order=desc`.
- Apply the search candidate cap per query instead of globally across
all queries.
- Bump the collector script version to `3`.
- Add tests that cover updated sorting and per-query candidate limits.
## Verification
- `pytest
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `ruff check
.codex/skills/codex-issue-digest/scripts/collect_issue_digest.py
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `git diff --check`
- Reran the all-areas weekly collector and confirmed #16088 is now
included with `55` interactions.
## Why
Remote-control app-server enrollments have both an internal server id
and the environment id exposed to remote-control clients. App-server
clients need one current status snapshot that says whether remote
control is usable and which environment id, if any, is exposed.
A temporary websocket disconnect is not itself an identity change.
Account changes, stale enrollment invalidation, successful
re-enrollment, and missing ChatGPT auth are meaningful status changes.
Disabled remote control remains `disabled` regardless of auth or SQLite
state. SQLite startup failure disablement and enrollment persistence
failures are handled in #20068; this PR reports the resulting effective
status to clients.
## What changed
- Adds v2 `remoteControl/status/changed` carrying `state` and
`environmentId`.
- Adds `RemoteControlConnectionState` values: `disabled`, `connecting`,
`connected`, and `errored`.
- Exposes remote-control status updates through `RemoteControlHandle`
using a Tokio watch channel.
- Always sends the current remote-control status snapshot to newly
initialized app-server clients.
- Broadcasts status changes to initialized app-server clients when state
or environment id changes.
- Treats missing ChatGPT auth as an `errored` status while leaving it
retryable because auth can change at runtime.
- Clears `environmentId` when enrollment is cleared for account changes,
auth loss, stale backend invalidation, or disabled remote control.
- Updates app-server protocol schema fixtures, generated TypeScript,
app-server README, remote-control tests, and TUI exhaustive notification
matches.
## Stack
- Builds on #20068.
## Verification
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server transport::remote_control --lib`
- `cargo check -p codex-tui`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
Right now, if Codex winds up in a state with auth but it can't refresh
the token, the user is left with an unhelpful message that says to log
out and log back in again.
Ultimately, we should prevent that from happening but if it does,
returning None will allow the caller to redirect the user back to the
login page
## Summary
- Add `PermissionProfile`-based turn submission helpers to
`core_test_support`, while keeping the legacy `SandboxPolicy` helper for
tests that intentionally exercise legacy fallback behavior.
- Switch the default `TestCodex::submit_turn()` path to send a real
`PermissionProfile` plus the required legacy compatibility projection in
`Op::UserTurn`.
- Migrate straightforward app/search/shell/truncation tests from
`SandboxPolicy::{DangerFullAccess, ReadOnly}` to
`PermissionProfile::{Disabled, read_only}`.
- Add a TUI compatibility projection helper for legacy app-server fields
so non-legacy writable roots are preserved instead of being downgraded
to read-only.
- Fix remote start/resume/fork sandbox-mode projection to classify any
managed profile with writable roots as workspace-write, not only
profiles that can write `cwd`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 47
files to 41 files without changing production behavior.
## Testing
- `cargo check -p codex-core --tests`
- `cargo test -p codex-tui
compatibility_profile_preserves_unbridgeable_write_roots`
- `cargo test -p codex-tui
sandbox_mode_preserves_non_cwd_write_roots_for_remote_sessions`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
## Why
The proxy matches allow and deny rules against normalized host strings.
Scoped IPv6 literals can arrive in equivalent forms, such as
`fd00::1%eth0`, `[fd00::1%eth0]`, or `[fd00::1%25eth0]`. Policy should
canonicalize those spellings without erasing scope granularity: an
unscoped rule like `fd00::1` should still cover scoped requests for that
address, while a scoped rule like `fd00::1%eth0` should remain exact to
that scope.
## What changed
- preserve IPv6 scope IDs during host normalization and canonicalize
`%25scope` to `%scope`
- match policy against the exact normalized host plus the unscoped IP
base for scoped literals
- keep local-address explicit allow checks aligned with the same
scoped/unscoped semantics
- add focused coverage for scoped IPv6 normalization, scoped allow
rules, and scoped deny rules in `network-proxy`
## Security impact
A request cannot bypass a broad deny rule by adding an IPv6 scope
suffix. At the same time, scoped policy remains precise:
`deny=fd00::1%eth0` affects that scoped spelling without collapsing
`fd00::1%eth1` onto the same key, and `allow=fe80::1%eth0` does not
implicitly allow other scopes.
## Verification
- `just fmt`
- `cargo test -p codex-network-proxy`
- `just fix -p codex-network-proxy`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: evawong-oai <evawong@openai.com>
The test was flaky because it was checking the right thing in a
roundabout way.
What it wanted to prove:
- plugin hooks receive the right environment variables.
What it actually did:
1. Run a plugin hook.
2. Have that hook write those env vars into a temporary `env.json` file.
3. After the hook finished, read `env.json` back from disk.
On Windows, that last file was sometimes not there when the test tried
to read it, so the test failed with `read env log: file not found`. The
hook system itself was not what the test failure was directly proving;
the test was failing on the extra filesystem side effect it introduced.
The fix is to stop using a temp file as the proof mechanism. The hook
now prints the env values in its normal structured output, and the test
asserts on the output that the hook engine already captures. So we still
verify the same behavior, but without depending on a separate file being
created and read back correctly on Windows.
This PR adds a new feature to the `/plugins` menu that gives users the
ability to add new plugin marketplaces. It introduces an Add Marketplace
tab to the right of installed marketplaces, a source prompt, loading and
error states, and the app-server request flow needed to perform the
install. After a successful `marketplace/add`, the popup refreshes back
into the newly added marketplace tab so the new plugins are immediately
visible.
- Add an Add Marketplace tab to the `/plugins` menu
- Prompt for marketplace source input from git repo, URL, or local path
- Show loading and error states during `marketplace/add`
- Refresh plugin data after success and switch into the newly added
marketplace tab
- Add tests and snapshot updates
## Why
Plugins can bundle lifecycle hooks, but Codex previously only discovered
hooks from user, project, and managed config layers. This adds the
plugin discovery and runtime plumbing needed for plugin-bundled hooks
while keeping execution behind the `plugin_hooks` feature flag.
## What
- Discovers plugin hook sources from each plugin's default
`hooks/hooks.json`.
- Supports `plugin.json` manifest `hooks` entries as either relative
paths or inline hook objects.
- Plumbs discovered plugin hook sources through plugin loading into the
hook runtime when `plugin_hooks` is enabled.
- Marks plugin-originated hook runs as `HookSource::Plugin`.
- Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
command environments.
- Updates generated schemas and hook source metadata for the plugin hook
source.
## Stack
1. This PR - openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
- Core logic is in `codex-rs/core-plugins/src/loader.rs` and
`codex-rs/hooks/src/engine/discovery.rs`
- Moved existing / adding new tests to
`codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
- Otherwise mostly plumbing and minor schema updates
### Core Changes
The `codex-rs/core` changes are limited to wiring plugin hook support
into existing core flows:
- `core/src/session/session.rs` conditionally pulls effective plugin
hook sources and plugin hook load warnings from `PluginsManager` when
`plugin_hooks` is enabled, then passes them into `HooksConfig`.
- `core/src/hook_runtime.rs` adds the `plugin` metric tag for
`HookSource::Plugin`.
- `core/config.schema.json` picks up the new `plugin_hooks` feature
flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
added plugin hook fields.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.
This PR separates those concepts so trace consumers can reliably answer
both questions:
- which Responses API response did this inference produce?
- which upstream request handled it?
## Structure
The change keeps the upstream request id at the same lifecycle level as
the provider stream:
- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.
## Why This Shape
`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.
Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.
## Validation
The PR updates trace and reducer coverage for:
- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.
## Why
Remote control depends on the app-server SQLite state DB for persisted
enrollment identity. If the state DB cannot be opened at startup,
continuing with remote control enabled leaves the process in a
misleading state where enrollment identity cannot be read or persisted.
Feature-disabled remote control remains disabled regardless of SQLite
state. This only changes the case where remote control is requested but
the SQLite state DB is unavailable.
## What changed
- Logs SQLite state DB initialization failures instead of dropping the
error silently.
- Treats remote control as effectively disabled when the SQLite state DB
is unavailable.
- Prevents `RemoteControlHandle::set_enabled(true)` from enabling remote
control later in the same process if the state DB was unavailable at
startup.
- Keeps the existing behavior that disabled remote control does not
validate or connect to the remote-control URL.
- Makes persisted enrollment load/update failures propagate as
remote-control errors instead of silently falling back to in-memory
state.
- Makes the direct websocket connection path fail when called without a
SQLite state DB.
- Adds coverage for startup without a state DB, later handle enablement
with no state DB, and direct websocket connection without a state DB.
## Verification
- `cargo test -p codex-app-server transport::remote_control --lib`
- `just fix -p codex-app-server`
## Why
MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.
This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.
## What Changed
- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
## Why
The proxy checks the requested host before opening the upstream
connection, but DNS can resolve an allowed hostname to a loopback,
private, or other non-public address after that first decision. Without
a final check on the actual socket target, a request that looks
acceptable at the hostname layer can still connect to a local service
once resolution completes.
## What changed
- add a shared TCP connector check for direct proxy egress
- use that path for HTTP, `CONNECT`, SOCKS5, and MITM upstream
connections
- keep configured upstream proxy hops on the existing proxy path
- add direct-connector coverage for allowed and rejected local targets
## Security impact
Direct proxy egress now rechecks the resolved socket address before
connecting, closing the gap between hostname policy evaluation and the
final network target.
## Verification
- `cargo test -p codex-network-proxy`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Agent Identity sessions can represent Business and Enterprise ChatGPT
workspaces, but cloud requirements were skipped before fetch. That meant
workspace-managed requirements were not loaded for Agent Identity even
when the JWT carried the same account identity and plan information that
normal ChatGPT token auth exposes.
This PR now sits on top of the Agent Identity stack through
[#19764](https://github.com/openai/codex/pull/19764). Because
[#19763](https://github.com/openai/codex/pull/19763) moved task
registration into Agent Identity auth loading, cloud requirements no
longer needs a separate runtime-initialization step before building the
backend client.
## What changed
- Stop skipping `CodexAuth::AgentIdentity` in the cloud requirements
loader.
- Share the cloud requirements eligibility check between startup load
and background cache refresh.
- Rely on eagerly loaded Agent Identity auth so backend requests can
attach task-scoped `AgentAssertion` headers.
- Decode Agent Identity JWT `plan_type` as the auth-layer plan type,
then convert it through a shared `auth::PlanType` -> `account::PlanType`
mapping.
- Add the missing serde alias for the `education` plan string and add
coverage for raw Agent Identity plan aliases such as `hc` and
`education`.
## Testing
- `cargo test -p codex-agent-identity -p codex-login -p
codex-cloud-requirements -p codex-protocol`
## Why
Initialized app-server RPCs no longer need to bottleneck behind one
request processor path. Running them concurrently improves
responsiveness, but several request families still mutate shared state
or depend on ordered side effects. Those stateful families need an
auditable serialization contract so concurrency does not reorder thread,
config, auth, command, watcher, MCP, or similar state transitions.
This PR keeps that boundary explicit: stateful work is serialized by the
smallest useful key, while intentionally read-only or externally
concurrent work remains unkeyed. In particular, `thread/list` and
`thread/turns/list` explicitly have no serialization because they
primarily read append-only rollout storage and should continue to be
served concurrently.
## What changed
- Adds `ClientRequest::serialization_scope()` in `app-server-protocol`
and requires every client request definition to declare its
serialization behavior.
- Introduces keyed request scopes for thread, thread path, command exec
process, fuzzy search session, fs watch, MCP OAuth, and global state
buckets such as config, account auth, memory, and device keys.
- Routes initialized app-server RPCs through per-key FIFO serialization
while allowing unkeyed initialized requests to run concurrently.
- Cancels in-flight initialized RPC work when the connection disconnects
or the app-server exits so spawned request tasks do not outlive their
session.
- Adds focused coverage for representative keyed and unkeyed
serialization scopes, including explicitly concurrent
`thread/turns/list` behavior.
## Validation
- Added protocol tests for representative keyed serialization scopes and
intentionally unkeyed request families.
- Added app-server request serialization tests covering per-key FIFO
behavior, concurrent unkeyed execution, disconnect shutdown, and config
read-after-write ordering.
- Local focused protocol validation after the latest rebase is currently
blocked by packageproxy failing to resolve locked `rustls-webpki
0.103.13`; CI is expected to provide the full validation signal.
## Why
The log DB writer batches tracing events before inserting them into
SQLite, but `tokio::time::interval` produces an immediate first tick.
That meant the inserter could flush the first accepted log entry before
`batch_size` was reached, making
`configured_batch_size_flushes_without_explicit_flush` timing-sensitive
in CI.
## What Changed
- Consume the interval's startup tick before entering the inserter loop,
so interval flushing starts after the configured delay.
- Remove the test's startup sleep, which was masking the race instead of
proving the batch-size behavior.
## Validation
- `cargo test -p codex-state`
- `cargo test -p codex-state
configured_batch_size_flushes_without_explicit_flush` passed 3
consecutive focused runs
- PR checks passed across `rust-ci`, Bazel, `ci`, `sdk`, `cargo-deny`,
Codespell, blob-size policy, and CLA
## Why
The Linux managed-proxy bridge helpers are long-lived child processes in
the sandbox networking path. Before this change they stayed dumpable and
the network seccomp profile did not block cross-process memory syscalls,
so another same-user process could potentially inspect or modify bridge
memory instead of interacting only through the intended proxy interface.
## What changed
- reuse the shared `codex-process-hardening` helper to mark bridge
helper children non-dumpable before they begin serving
- deny `process_vm_readv` and `process_vm_writev` in the existing
network seccomp filter
## Security impact
Bridge helpers are less exposed to same-user cross-process inspection or
memory writes, which reduces the chance that sandboxed code can
interfere with proxy support processes outside the intended IPC path.
## Verification
- `cargo test -p codex-process-hardening`
- `cargo test -p codex-linux-sandbox`
- attempted `cargo check -p codex-linux-sandbox --target
x86_64-unknown-linux-gnu`; blocked on missing `x86_64-linux-gnu-gcc` on
this macOS host
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.
[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>
## What Changed
Added response-level token fields on completed handle_responses spans:
gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:
codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.
## Testing
- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields
Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
## Why
The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.
This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.
## What changed
- Adds built-in `default_permissions` profile names:
- `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.
## Verification
- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`
## Documentation
Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
## Why
Managed sessions use `NO_PROXY` to keep a small set of destinations on
the direct path by default. The old default also bypassed all IPv4
link-local addresses in `169.254.0.0/16`, which includes metadata
endpoints such as `169.254.169.254`. Because `NO_PROXY` is evaluated by
the client before the request reaches the managed proxy, requests to
that range could skip proxy-side allowlist and local-binding checks
entirely. On hosts where a link-local metadata service is reachable,
that creates a path to sensitive environment metadata or credentials
outside the intended enforcement point.
## What changed
- remove the default IPv4 link-local `169.254.0.0/16` bypass from the
managed proxy environment
- keep the existing loopback and private-network defaults unchanged
- update the regression assertion to lock in the narrower default
## Security impact
Link-local requests now stay on the managed-proxy path by default, so
the proxy can apply configured policy before they reach metadata-style
endpoints or other link-local services.
## Verification
- `cargo test -p codex-network-proxy`
Co-authored-by: Codex <noreply@openai.com>
## Summary
This extends external agent detection/import beyond config artifacts so
Codex can detect recent sessions files from the external agent home and
import them into Codex rollout history.
## What changed
- Added a focused `external_agent_sessions` module for:
- session discovery
- source-record parsing
- rollout construction
- import ledger tracking
- Wired session detection/import into the app-server external agent
config API.
- Added compaction handling so large imported sessions can be resumed
safely before the first follow-up turn.
## Testing
Added coverage for:
- recent-session detection
- custom-title handling
- recency filtering
- dedupe and re-detect-after-source-change behavior
- visible imported turn construction
- backward-compatible import payload deserialization
- end-to-end RPC import flow
- rejection of undetected session paths
- repeat-import behavior
- large-session compaction before first follow-up
Ran:
- `cargo test -p codex-app-server external_agent_config_import_ --test
all`
## Summary
- exit shell mode when `Esc` is pressed while the absorbed `!` is the
only input
- add direct regression coverage plus a composer snapshot for the
restored normal prompt state
## Root cause
Shell mode stores the leading `!` outside the editable textarea. After
typing only `!`, the textarea is empty but the composer is still in bash
mode, so the existing empty-composer `Esc` handling never runs.
## Validation
- `just fmt`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::esc_exits_empty_shell_mode`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::footer_mode_snapshots`
- `cargo insta pending-snapshots`
`cargo test -p codex-tui` still reports unrelated existing `/status`
snapshot drift in this local environment because the rendered
permissions text is `workspace-write with network access` instead of the
older `read-only` fixture text.
Move local resume and fork cwd filtering to `thread/list` instead of
filtering in the TUI. This makes the `/resume` menu feel slightly faster
to load when working in repos with many historical threads, and
centralizes the cwd filtering in app-server.
**Affected:**
- /resume from inside the TUI.
- codex resume with no session ID and without --last
- codex resume --all
- codex fork with no session ID and without --last
- codex fork --all
**Not affected:**
- codex resume <id>
- codex fork <id>
- codex resume --last
- codex fork --last
Steps to test performance improvement in a real Codex environment:
- Launch `codex resume` using compiled binary in a directory that has
seen many threads.
- Launch `codex resume` using release binary in same directory.
- Observe difference in time-to-full-page as threads load.
## Summary
- suggest Plan mode when the current composer draft contains the
standalone word `plan`
- shares the Codex App heuristics for detection
- excludes things line `/plan` and the word plan in shell mode
- reuse the existing `Shift+Tab` mode cycle and add thread-scoped
dismissal with `Esc`
- replace the normal footer hint while the reminder is visible so the
statusline stays anchored
https://github.com/user-attachments/assets/01123ae8-cee6-4e95-b563-44655c071cde
## Why
The desktop app already nudges users toward Plan mode when their draft
clearly signals planning intent. The TUI had the underlying `/plan` and
`Shift+Tab` flows, but no equivalent reminder at the moment the user was
most likely to benefit from them.
## Details
The reminder is shown only when Plan mode is available, the draft
contains standalone `plan`, the user is not already in Plan mode, the
composer is actionable, and the current thread has not dismissed the
reminder. Slash-command and shell-command drafts are excluded.
The first implementation used an extra composer row, but that moved the
statusline whenever the heuristic fired. This version keeps the layout
stable by rendering the reminder in the existing footer row instead.
## Validation
- `INSTA_UPDATE=always cargo test -p codex-tui
chatwidget::tests::plan_mode::plan_mode_nudge -- --nocapture`
- `just fmt`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui`
- `cargo insta pending-snapshots`
- `git diff --check`
## Why
Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.
## What changed
- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.
## Verification
- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.
## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.
## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.
## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.
## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
## Why
Several bug reports describe thread shutdown (including subagent
threads) leaving stdio MCP server processes behind. These reports all
point at the same lifecycle gap: Codex launches stdio MCP servers, but
the session-level shutdown path does not explicitly close MCP clients or
terminate the server process tree.
Fixes#12491Fixes#12976Fixes#18881Fixes#19469
## History
This is best understood as a regression/coverage gap in MCP session
lifecycle management, not as stdio MCP cleanup being absent all along.
#10710 added process-group cleanup for stdio MCP servers, but that
cleanup only runs when the `RmcpClient`/transport is dropped. The older
reports (#12491 and #12976) came after that cleanup existed, which
suggests the remaining problem was that some higher-level shutdown paths
kept the MCP manager alive or replaced it without explicitly draining
clients. The newer reports (#18881 and #19469) exposed the same family
around manager replacement and shutdown.
## What changed
- Added an explicit stdio MCP process handle in `codex-rmcp-client` so
local MCP servers terminate their process group and executor-backed MCP
servers call the executor process terminator.
- Added `RmcpClient::shutdown()` and manager-level MCP shutdown draining
so session shutdown, channel-close fallback, MCP refresh, and connector
probing stop owned MCP clients.
- Added regression coverage that starts a stdio MCP server, begins an
in-flight blocking tool call, shuts down the client, and asserts the
server process exits.
## Verification
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-mcp`
- `just fix -p codex-rmcp-client`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
- Manual before/after validation with a temporary repro script:
- Pre-fix binary from `HEAD^` (`fed0a8f4fa`): reproduced the leak with
surviving MCP server and child PIDs, `survivors=[77583, 77592]`,
`leaked=true`.
- Post-fix binary from this branch (`67e318148b`): verified both MCP
processes were gone after interrupting `codex exec`, `survivors=[]`,
`leaked=false`.
## Why
Fixes#19814.
The TUI's current `Worked for ...` timing behavior is a leftover from
#9599. At that point, models could emit multiple assistant messages in
one turn for preambles/commentary, but the TUI did not yet have a
reliable signal that an assistant message was the final answer when it
started streaming. To avoid showing an ever-growing elapsed time on each
preamble separator, #9599 made the separator timer incremental by
tracking elapsed time since the previous separator.
That workaround is no longer the right model for the final
completed-turn display. Since then, #16638 added protocol-native turn
timing, including `duration_ms` on turn completion. With that cumulative
duration available at the point where the TUI renders the completed-turn
separator, the UI can show the actual turn duration directly instead of
carrying per-separator timing state.
## What Changed
- Thread `duration_ms` into `ChatWidget::on_task_complete` from both
legacy `TurnCompleteEvent` handling and app-server `TurnCompleted`
notifications.
- Use `duration_ms` for the final `Worked for ...` separator, falling
back to the status indicator timer only when the protocol duration is
unavailable.
- Keep mid-turn separators before later assistant text as plain visual
dividers instead of clocked `Worked for ...` separators.
- Remove the old incremental separator timer state and helper
(`last_separator_elapsed_secs` / `worked_elapsed_from`).
- Add a snapshot regression test for a turn that runs a command and then
completes with a final answer, verifying the final separator uses the
cumulative turn duration.
## Verification
- `cargo test -p codex-tui
final_worked_for_uses_cumulative_turn_duration_snapshot`
- `just fix -p codex-tui`
Manual repro prompt:
```text
Manual timing repro. First send a short preamble/commentary sentence before using tools. Then run exactly this shell command: sleep 75; echo MANUAL_TIMING_DONE. After the command finishes, give a final answer that says "done". Do not skip the preamble.
```
After this change, the mid-turn break before the final answer should be
a plain divider, and the final completed-turn separator should show
`Worked for ...` using the cumulative turn duration.
Before:
<img width="414" height="102" alt="Screenshot 2026-04-27 at 10 09 01 PM"
src="https://github.com/user-attachments/assets/b9e2ce01-2460-40e4-a5c4-c9ba8add2557"
/>
After:
<img width="485" height="149" alt="Screenshot 2026-04-27 at 10 09 07 PM"
src="https://github.com/user-attachments/assets/d24089ae-d4e2-41b6-b966-07c98706ead4"
/>
## Summary
Make FileSystemSandboxPolicy the semantic source of truth for project
root metadata protection. Under writable roots, `.git`, `.codex`, and
`.agents` stay protected unless user policy grants an explicit write
rule for that metadata path.
## Scope
1. Add `protected_metadata_names` to `WritableRoot`.
2. Teach `FileSystemSandboxPolicy::can_write_path_with_cwd` to reject
protected metadata writes under writable roots unless explicitly
allowed.
3. Default workspace write profiles to protect `.git`, `.codex`, and
`.agents`.
4. Add the Linux fallback setup needed before Linux enforcement lands
later in the stack.
## Reviewer Focus
1. The policy decision belongs in FileSystemSandboxPolicy, not shell
command parsing.
2. Legacy SandboxPolicy remains a compatibility projection, not the
source of the new rule.
3. Explicit user write rules can still opt into these metadata paths.
## Stack
1. Policy primitive: this PR
2. macOS Seatbelt adapter: #19847
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852
## Validation
1. codex protocol permissions tests
2. formatting for codex protocol and codex linux sandbox
3. diff whitespace check
## Why
The TUI currently handles keyboard shortcuts as hard-coded event matches
spread across app, composer, pager, list, approval, and navigation code.
That makes shortcuts hard to customize, makes displayed hints easy to
drift from actual behavior, and makes future keymap work riskier because
there is no central action inventory.
This PR adds the foundation for configurable, action-based keymaps
without adding the interactive remapping UI yet. Onboarding
intentionally stays on fixed startup shortcuts because users cannot
reasonably configure keymaps before completing onboarding.
This is PR1 in the keymap stack:
- PR1: #18593: configurable keymap foundation
- PR2: #18594: `/keymap` picker and guided remapping UI
- PR3: #18595: Vim composer mode and the remap option
## Design Notes
The new model resolves named actions into concrete runtime bindings once
from config, then passes those bindings to the UI surfaces that handle
input or render shortcut hints.
The main concepts are:
- **Context**: a scope where an action is active, such as `global`,
`chat`, `composer`, `editor`, `pager`, `list`, or `approval`.
- **Action**: a named operation inside a context, such as
`global.open_transcript`, `composer.submit`, or `pager.close`.
- **Binding**: one or more single-key shortcuts assigned to an action,
written as config strings such as `ctrl-t`, `alt-backspace`, or
`page-down`. Multi-step sequences such as `ctrl-x ctrl-s`, `g g`, or
leader-key flows are not part of this PR.
- **Resolution order**: context-specific config wins first, supported
global fallbacks come next, and built-in defaults fill in anything
unset.
- **Explicit unbinding**: an empty array removes an action binding in
that scope and does not fall through to a fallback binding.
- **Conflict validation**: a resolved keymap rejects duplicate active
bindings inside the same scope so one keypress cannot dispatch two
actions.
## What Changed
- Added `TuiKeymap` config support under `[tui.keymap]`, including typed
contexts/actions, key alias normalization, generated schema coverage,
and user-facing config errors.
- Added `RuntimeKeymap` resolution in `codex-rs/tui/src/keymap.rs`,
including fallback precedence, built-in defaults, explicit unbinding,
and per-context conflict validation.
- Rewired existing TUI handlers to consume resolved keymap actions
instead of directly matching hard-coded keys in each component.
- Updated key hint rendering and footer/pager/list surfaces so displayed
shortcuts follow the resolved keymap.
- Kept onboarding shortcuts fixed in
`codex-rs/tui/src/onboarding/keys.rs` instead of exposing them through
`[tui.keymap]`.
## Validation
The branch includes focused coverage for config parsing, key
normalization, runtime fallback resolution, explicit unbinding,
duplicate-key conflict validation, default keymap consistency,
onboarding startup key behavior, and UI hint snapshots affected by
resolved key bindings.
## Why
Codex enables enhanced keyboard reporting while the TUI owns the
terminal. In iTerm2, exiting the TUI with Ctrl+C can intermittently
leave the parent shell receiving raw CSI-u / `modifyOtherKeys` fragments
instead of normal key input.
Final terminal cleanup should put the parent shell back into normal
keyboard reporting even if the terminal misses the usual stack pop.
Fixes#19553.
## What Changed
- Move TUI keyboard enhancement setup and detection into
`tui/src/tui/keyboard_modes.rs`.
- Add an exit-only `restore_after_exit()` path that performs the normal
keyboard enhancement pop plus unconditional keyboard enhancement and
`modifyOtherKeys` resets.
- Keep temporary restore paths, such as external-editor handoff, using
the balanced stack pop behavior.
## Confidence
Medium. This is a speculative fix: I was not able to reproduce the
reported iTerm2 behavior manually, but the symptoms line up with
terminal keyboard reporting state surviving Codex exit. The added reset
sequences are scoped to final TUI shutdown and should be harmless when
the terminal is already clean.
## Why
Memory startup runs in the background after an eligible turn, but it can
consume Codex backend quota at exactly the wrong time: when the user is
already near a rate-limit boundary. This PR adds a guard so the memory
pipeline backs off when the Codex rate-limit snapshot says the remaining
budget is too low.
## What Changed
- Added `memories.min_rate_limit_remaining_percent` with a default of
`25`, clamped to `0..=100`, and regenerated `core/config.schema.json`.
- Added `codex-rs/memories/write/src/guard.rs`, which fetches Codex
backend rate limits before memory startup and skips phase 1 / phase 2
when the Codex limit is reached or either tracked window is above the
configured usage ceiling.
- Keeps startup best-effort: non-Codex auth or rate-limit fetch/client
failures preserve the existing memory startup behavior.
- Records a `codex.memory.startup` counter with
`status=skipped_rate_limit` when startup is skipped.
- Added config parsing/clamping coverage and guard unit tests.
## Verification
- Added `codex-rs/memories/write/src/guard_tests.rs` for threshold,
primary/secondary window, and reached-limit behavior.
- Added config tests for TOML parsing and clamping.
## Summary
AgentIdentity runtime loading currently registers tasks against a single
hardcoded AuthAPI base URL. That works for production, but local and
staging validation may need registration to target a different
authapi-login-provider without baking internal staging service URLs into
the OSS binary.
This PR adds a small config surface for
`agent_identity_authapi_base_url` and threads it through the existing
auth-loading path as a direct argument. Explicit config wins. Without
config, task registration keeps using the production AuthAPI URL,
matching the current default behavior.
## Stack
1. openai/codex#19762 - `refactor: make auth loading async` (merged)
2. openai/codex#19763 - `refactor: load agent identity runtime eagerly`
3. This PR - `fix: configure AgentIdentity AuthAPI base URL`
4. openai/codex#19764 - `feat: verify agent identity JWTs with JWKS`
## Design decisions
- Keep the existing auth-loading shape and pass the new value as an
argument. This avoids another wrapper loader and keeps the call path
readable.
- Add config instead of embedding internal staging URLs. Environments
that need a non-production AuthAPI can configure it explicitly.
- Keep the default AuthAPI registration URL as production.
`chatgpt_base_url` remains separate and is used by the follow-up JWKS
verification PR for fetching public keys from the ChatGPT backend route.
- Resolve the AuthAPI base URL inside AgentIdentity loading, because
task registration is the only consumer of this value.
## Testing
Tests: targeted Rust checks, AgentIdentity auth tests, config schema
regeneration, formatter/fix pass, and whitespace diff check.
## Why
Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.
The idea is to prevent ghost cost due to pre-warm triggered by the app
Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.
## What Changed
- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)
## Verification
Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption
## What Changed
- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.
## Test Coverage
- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.
---------
Co-authored-by: Codex <noreply@openai.com>
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
MultiAgentV2 sessions need startup guidance that matches the role of the
thread that is actually being created. Root agents and subagents have
different responsibilities, and forked subagents can inherit parent
rollout history. If the parent hint is carried into the child context,
the child can see stale or conflicting developer guidance before its own
session-specific context is added.
## What changed
- Added `features.multi_agent_v2.root_agent_usage_hint_text` and
`features.multi_agent_v2.subagent_usage_hint_text` config fields,
including schema/config parsing support.
- Injected the matching root or subagent hint into the initial context
as its own developer message when `multi_agent_v2` is enabled.
- Filtered configured MultiAgentV2 usage-hint developer messages out of
forked parent history so a child thread receives fresh guidance for its
own session source/config.
- Added targeted coverage for config parsing, initial-context rendering,
feature-config deserialization, and forked-history filtering.
## Context examples
With this config:
```toml
[features.multi_agent_v2]
enabled = true
root_agent_usage_hint_text = "Root guidance."
subagent_usage_hint_text = "Subagent guidance."
```
A root thread initial context renders the root hint as a standalone
developer message:
```text
[developer]
<existing developer context, when present>
[developer]
Root guidance.
```
A subagent thread initial context renders the subagent hint instead:
```text
[developer]
<existing developer context, when present>
[developer]
Subagent guidance.
```
When a subagent forks parent history, any parent developer message whose
text exactly matches the configured MultiAgentV2 root or subagent hint
is omitted from the forked history before the child receives its fresh
subagent hint.
## Why
Addresses #9274
Running `codex update` currently starts an interactive Codex session
with `update` as the prompt. That is a rough edge for users who expect a
direct self-update command after seeing the existing update notice, and
it forces them to copy the suggested package-manager command manually.
## What changed
- Added a top-level `codex update` subcommand.
- Reused the existing install-channel detection and update command
runner that the TUI already uses for update prompts.
- Exposed the update-action lookup from `codex-tui` so the CLI can
invoke the same behavior.
- Added CLI coverage to ensure `codex update` is parsed as a subcommand
instead of becoming an interactive prompt.
## Verification
- `cargo test -p codex-cli`
- `cargo test -p codex-tui update_action::tests`
## Why
`PermissionProfile` is now the canonical internal permissions
representation, but the app-server wire shape is still intentionally
unstable while the migration continues. Stable app-server clients should
not see or generate code for these fields until the wire format settles.
## What changed
- Marks every app-server v2 field that sends `PermissionProfile` as
experimental, including `command/exec`, `thread/start`, `thread/resume`,
`thread/fork`, and `turn/start` request/response payloads.
- Enables per-field experimental inspection for `command/exec`, so
`permissionProfile` is gated without making the entire method
experimental.
- Fixes the generated TypeScript schema filter to be comment-aware. The
previous scanner treated apostrophes inside doc comments as string
delimiters, so some experimental fields leaked into stable TypeScript
even though stable JSON was filtered correctly.
## Verification
- `cargo test -p codex-app-server-protocol`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19899).
* #19900
* __->__ #19899
## Why
After thread sessions have a required `PermissionProfile`, the TUI no
longer needs to cache a separate legacy `SandboxPolicy` in
`ThreadSessionState`. Keeping the legacy field would reintroduce two
permission authorities in the session cache and make later
replay/switching logic easier to get wrong.
This PR keeps legacy app-server compatibility at the ingestion boundary:
old `sandbox` response values are still accepted, but they are
immediately converted to a cwd-anchored profile.
## What Changed
- Removes `ThreadSessionState.sandbox_policy`.
- Updates active-session permission syncing to write only the current
`PermissionProfile`.
- Updates thread-read/replay/test fixtures to use profiles as the cached
session permission source.
- Leaves legacy `sandbox` fields in app-server request/response protocol
paths unchanged; those are compatibility boundaries and are converted
before entering cached TUI state.
## Verification
- `cargo test -p codex-tui thread_session_state::tests --lib`
- `cargo test -p codex-tui
inactive_thread_started_notification_initializes_replay_session --lib`
- `cargo test -p codex-tui thread_events --lib`
- `just fix -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19776).
* #19900
* #19899
* __->__ #19776
## Why
Remote TUI resume uses the app-server websocket client. That client
inherited tungstenite's default `16 MiB` frame limit, so a large saved
session could make `thread/resume` return a single JSON-RPC response
frame that the client rejected before the TUI could deserialize or
render it.
Fixes#19837
## What Changed
- Configure the remote app-server websocket client with a bounded `128
MiB` max frame/message size.
- Preserve the concrete remote worker exit reason when completing
pending requests after a transport/read failure instead of replacing it
with a generic channel-closed error.
- Add a regression test that sends a single `>16 MiB` JSON-RPC response
frame and verifies the typed request succeeds.
Note: This isn't a perfect fix. It really just moves the limit to a much
larger value. I looked at a bunch of other potential fixes (both
server-side and client-side), and they all involved significant
complexity, had backward-compatibility impact, or impacted performance
of common use cases. This simple fix should address the vast majority of
remote use cases.
## Verification
I reproed the problem locally using a long rollout. Verified that fix
addresses connection drop.
## Why
`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.
The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.
## What Changed
- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.
## Verification
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
## Why
`SessionConfiguredEvent` is the internal event that tells clients what
permissions are active for a session. Emitting both `sandbox_policy` and
`permission_profile` leaves two possible authorities and forces every
consumer to decide which one to honor. At this point in the migration,
the profile is expressive enough to represent managed, disabled, and
external sandbox enforcement, so the internal event can be profile-only.
The wire compatibility concern is older serialized events or rollout
data that only contain `sandbox_policy`; those still need to
deserialize.
## What Changed
- Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
`permission_profile` required.
- Adds custom deserialization so old payloads with only `sandbox_policy`
are upgraded to a cwd-anchored `PermissionProfile`.
- Updates core event emission and TUI session handling to sync
permissions from the profile directly.
- Updates app-server response construction to derive the legacy
`sandbox` response field from the active thread snapshot instead of from
`SessionConfiguredEvent`.
- Updates yolo-mode display logic to treat both
`PermissionProfile::Disabled` and managed unrestricted filesystem plus
enabled network as full-access, while still preserving the distinction
between no sandbox and external sandboxing.
## Verification
- `cargo test -p codex-protocol session_configured_event --lib`
- `cargo test -p codex-protocol serialize_event --lib`
- `cargo test -p codex-exec session_configured --lib`
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox
--lib`
- `cargo test -p codex-tui session_configured --lib`
- `cargo test -p codex-tui
yolo_mode_includes_managed_full_access_profiles --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
* #19900
* #19899
* #19776
* #19775
* __->__ #19774
## Why
Fixes#19475.
`codex exec` can finish successfully and then emit an `ERROR` on stderr:
```text
failed to record rollout items: thread <id> not found
```
That happens because shutdown closes the live thread writer before
emitting `ShutdownComplete`. The terminal event was still using the
normal `send_event_raw` path, so it tried to append rollout items
through a recorder that had already been removed. The answer is correct,
but wrappers that treat stderr as failure can retry completed exec runs.
This looks like a likely recent regression from
[#18882](https://github.com/openai/codex/pull/18882), which routed live
thread writes through `ThreadStore` and added the shutdown-time live
writer close. I have not bisected this, so the PR treats #18882 as the
likely source based on the affected shutdown code path rather than a
proven first-bad commit.
## What Changed
`ShutdownComplete` now bypasses rollout persistence after thread
shutdown and is delivered directly to clients. The shutdown path still
records the protocol event in the rollout trace before delivery,
preserving trace visibility without attempting a post-shutdown
thread-store append.
The change also adds a regression test with the in-memory thread store
to assert that shutdown creates and shuts down the live thread without
appending another item after shutdown.
Addresses #19856
## Summary
- Clarifies that external code contributions are invitation only.
- Points contributors to `docs/contributing.md` for the full policy
instead of using the previous warning phrasing.
## Summary
Adds the standard Codex `User-Agent` to shared default headers so the
responses-api WS handshake carries the same client OS and version
context as HTTP requests.
## Testing
- `cargo test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `cargo test -p codex-core --test all responses_websocket`
## Summary
AgentIdentity auth previously registered the process task lazily behind
a `OnceCell`. That meant the auth object could be constructed before its
runtime task binding was known.
This PR makes AgentIdentity auth load the runtime task at auth load time
and stores the resulting process task id directly on the auth object.
The model-provider call path can then read a concrete task id instead of
handling a missing lazy value.
## Stack
1. [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762) (merged)
2. **This PR:** [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [fix: configure AgentIdentity AuthAPI base
URL](https://github.com/openai/codex/pull/19904)
4. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)
## Important call sites
| Area | Change |
| --- | --- |
| `AgentIdentityAuth::load` | Registers the process task during auth
loading and stores `process_task_id`. |
| `CodexAuth::from_agent_identity_jwt` | Awaits AgentIdentity auth
loading. |
| model-provider auth | Reads a concrete `process_task_id` instead of an
optional lazy value. |
| AgentIdentity auth tests | Mock task registration now covers eager
runtime allocation. |
## Design decisions
AgentIdentity auth now treats task registration as part of constructing
a usable auth object. That matches how callers use the value: once auth
is present, the model-provider path expects the task-scoped assertion
data to be ready.
## Testing
Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
- Marks `/title` and `/statusline` as available during active tasks.
- Extends the existing slash-command availability test coverage to
include these commands alongside `/goal`.
## Why
`ThreadSessionState` is the TUI's cached view of an app-server session.
To make `PermissionProfile` the canonical runtime permissions model,
cached thread sessions need to always have a profile instead of treating
the profile as an optional supplement to a legacy `sandbox` response
field.
The main compatibility concern is older app-server v2 lifecycle
responses that only include `sandbox` and omit `permissionProfile`:
- `thread/start` -> `ThreadStartResponse.sandbox`
- `thread/resume` -> `ThreadResumeResponse.sandbox`
- `thread/fork` -> `ThreadForkResponse.sandbox`
Those responses must still hydrate correctly when the TUI is pointed at
an older app-server. This PR converts the legacy `sandbox` value into a
`PermissionProfile` immediately at response ingestion time, using the
response `cwd`, so cached sessions do not carry an optional profile that
can later reinterpret cwd-bound grants against a different thread cwd.
This fallback is intentionally boundary compatibility. The follow-up PRs
in this stack continue the cleanup by making `SessionConfiguredEvent`
profile-only, deriving sandbox projections from snapshots only when an
API still needs them, and then removing `sandbox_policy` from
`ThreadSessionState`.
## What Changed
- Makes `ThreadSessionState.permission_profile` required.
- Converts legacy app-server response `sandbox` values into a
`PermissionProfile` at ingestion time using the response cwd.
- Ensures `thread/read` hydration does not reuse a primary session
profile that may be anchored to a different cwd; it uses the active
widget permission settings for the read thread fallback instead of
reusing cached primary-session permissions.
- Keeps the app-server request path unchanged: embedded sessions send
profiles, while remote sessions continue using legacy sandbox overrides
for compatibility.
## Verification
- `cargo test -p codex-tui thread_read --lib`
- `cargo test -p codex-tui
permission_settings_sync_preserves_active_profile_only_rules --lib`
- `cargo test -p codex-tui
resume_response_restores_turns_from_thread_items --lib`
- `cargo test -p codex-tui thread_session_state::tests --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19773).
* #19900
* #19899
* #19776
* #19775
* #19774
* __->__ #19773
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.
## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.
## Why
Recent `main` CI had repeated flakes in the plugin fixture tests:
- `codex-core::all
suite::plugins::explicit_plugin_mentions_inject_plugin_guidance` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `codex-core::all suite::plugins::plugin_mcp_tools_are_listed` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
The failures were in the same plugin/MCP fixture family: assertions
expected sample plugin guidance or tool inventory, but the test could
observe the session before the sample MCP server had finished startup.
## Root Cause
`explicit_plugin_mentions_inject_plugin_guidance` submitted the user
turn immediately after constructing the session. MCP startup is
asynchronous, so on a slower or busier CI runner the prompt could be
built before the sample plugin MCP server had reported its tools. That
made the test depend on scheduler timing rather than the fixture being
ready.
`plugin_mcp_tools_are_listed` already needed the same readiness
condition, but its wait logic was local to that test.
## What Changed
- Added a shared `wait_for_sample_mcp_ready` helper for the plugin
fixture tests.
- Wait for `McpStartupComplete` before submitting the explicit plugin
mention turn.
- Reuse the same readiness helper in the MCP tool-listing test.
## Why This Should Be Reliable
The tests now wait for the explicit readiness signal from the sample MCP
server before asserting guidance or tools derived from that server. This
removes the startup race while still exercising the real fixture path,
so the assertions should only run after the plugin inventory is
deterministic.
## Verification
- `cargo test -p codex-core --test all plugins::`
- GitHub CI for this PR is passing.
## Summary
- Extracted the shared filesystem types and `ExecutorFileSystem` trait
into a new `codex-file-system` crate
- Switched `codex-config` and `codex-git-utils` to depend on that crate
instead of `codex-exec-server`
- Kept `codex-exec-server` re-exporting the same API for existing
callers
## Testing
- Ran `cargo test -p codex-file-system`
- Ran `cargo test -p codex-git-utils`
- Ran `cargo test -p codex-config`
- Ran `cargo test -p codex-exec-server`
- Ran `just fix -p codex-file-system`, `just fix -p codex-git-utils`,
`just fix -p codex-config`, `just fix -p codex-exec-server`
- Ran `just fmt`
- Updated and verified the Bazel module lockfile
## Summary
Disallow fileParams metadata for custom MCPs
Restricts Codex openai/fileParams handling to the first-party codex_apps
MCP server. Custom MCP servers may still advertise the metadata, but
Codex now ignores it for upload rewriting, preventing non-Apps tools
from receiving signed OpenAI file refs for local paths. Added a
regression test for the allowed and denied cases.
## Why
This continues the permissions migration by making legacy config default
resolution produce the canonical `PermissionProfile` first. The legacy
`SandboxPolicy` projection should stay available at compatibility
boundaries, but config loading should not create a legacy policy just to
immediately convert it back into a profile.
Specifically, when `default_permissions` is not specified in
`config.toml`, instead of creating a `SandboxPolicy` in
`codex-rs/core/src/config/mod.rs` and then trying to derive a
`PermissionProfile` from it, we use `derive_permission_profile()` to
create a more faithful `PermissionProfile` using the values of
`ConfigToml` directly.
This also keeps the existing behavior of `sandbox_workspace_write` and
extra writable roots after #19841 replaced `:cwd` with `:project_roots`.
Legacy workspace-write defaults are represented as symbolic
`:project_roots` write access plus symbolic project-root metadata
carveouts. Extra absolute writable roots are still added directly and
continue to get concrete metadata protections for paths that exist under
those roots.
The platform sandboxes differ when a symbolic project-root subpath does
not exist yet.
* **Seatbelt** can encode literal/subpath exclusions directly, so macOS
emits project-root metadata subpath policies even if `.git`, `.agents`,
or `.codex` do not exist.
* **bwrap** has to materialize bind-mount targets. Binding `/dev/null`
to a missing `.git` can create a host-visible placeholder that changes
Git repo discovery. Binding missing `.agents` would not affect Git
discovery, but it would still create a host-visible project metadata
placeholder from an automatic compatibility carveout. Linux therefore
skips only missing automatic `.git` and `.agents` read-only metadata
masks; missing `.codex` remains protected so first-time project config
creation goes through the protected-path approval flow. User-authored
`read` and `none` subpath rules keep normal bwrap behavior, and `none`
can still mask the first missing component to prevent creation under
writable roots.
## What Changed
- Adds profile-native helpers for legacy workspace-write semantics,
including `PermissionProfile::workspace_write_with()`,
`FileSystemSandboxPolicy::workspace_write()`, and
`FileSystemSandboxPolicy::with_additional_legacy_workspace_writable_roots()`.
- Makes `FileSystemSandboxPolicy::workspace_write()` the single legacy
workspace-write constructor so both `from_legacy_sandbox_policy()` and
`From<&SandboxPolicy>` include the project-root metadata carveouts.
- Removes the no-carveout `legacy_workspace_write_base_policy()` path
and the `prune_read_entries_under_writable_roots()` cleanup that was
only needed by that split construction.
- Adds `ConfigToml::derive_permission_profile()` for legacy sandbox-mode
fallback resolution; named `default_permissions` profiles continue
through the permissions profile pipeline instead of being reconstructed
from `sandbox_mode`.
- Updates `Config::load()` to start from the derived profile, validate
that it still has a legacy compatibility projection, and apply
additional writable roots directly to managed workspace-write filesystem
policies.
- Updates Linux bwrap argument construction so missing automatic
`.git`/`.agents` symbolic project-root read-only carveouts are skipped
before emitting bind args; missing `.codex`, user-authored `read`/`none`
subpath rules, and existing missing writable-root behavior are
preserved.
- Adds coverage that legacy workspace-write config produces symbolic
project-root metadata carveouts, extra legacy workspace writable roots
still protect existing metadata paths such as `.git`, and bwrap skips
missing `.git`/`.agents` project-root carveouts while preserving missing
`.codex` and user-authored missing subpath rules.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19772).
* #19776
* #19775
* #19774
* #19773
* __->__ #19772
## Why
The remaining review, interrupt, fuzzy search, feedback, and git-diff
handlers still had local send-error branches that obscured otherwise
simple request handling. This final slice flattens those handlers
without changing the public protocol behavior.
## What Changed
- Streamlined review start, turn interrupt, fuzzy search session,
feedback upload, and git diff handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Converted validation and upload failures into returned JSON-RPC errors
where that avoids nested `send_error`/`return` blocks.
- Left unrelated sandbox setup and notification code untouched.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::review --
--test-threads=1`
## Summary
- Add the `enable_mcp_apps` feature flag to the `codex-features`
registry
- Keep it under development and disabled by default
## Testing
- Unit tests for `codex-features` passed
- Formatting passed
Implements #18162
This updates the TUI terminal title to show an explicit action-required
state when Codex is blocked on user approval or input. The terminal
title now uses the activity title item to cover both active work and
blocked-on-user states, while still accepting the legacy spinner config
value.
Changes
- Rename the terminal title item from `spinner` to `activity` while
preserving legacy config compatibility
- Show `[ ! ] Action Required `while approval or input overlays are
active, with a blinking `[ . ]` alternate state
- Suppress the normal working spinner while Codex is blocked on user
action
- Add targeted coverage for action-required title behavior and legacy
title-item parsing
Testing
- Trigger an approval or input modal and confirm the tab title
alternates between `[ ! ] Action Required` and `[ . ] Action Required`
- Disable the activity title item and confirm the action-required title
does not appear
- Resolve the prompt and confirm the title returns to the normal
spinning/idel state
https://github.com/user-attachments/assets/e9ecc530-a6be-4fd7-b9a6-d550a790eb2c
## Why
Turn and realtime handlers had nested validation and send-error branches
that made the request path longer than the behavior warranted. This
slice keeps the same request semantics while letting the handlers return
errors from the failing step.
## What Changed
- Streamlined turn start, injected item, and turn steer request handling
in `codex-rs/app-server/src/codex_message_processor.rs`.
- Applied the same result-returning shape to realtime session response
handlers.
- Preserved existing request validation and thread-manager interactions.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_steer --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_inject_items --
--test-threads=1`
## Why
Thread resume and fork had some of the deepest error-handling
indentation in this area because helpers emitted request errors
directly. Returning those failures gives the handlers a single request
boundary while preserving the async pending-resume behavior.
## What Changed
- Converted thread resume helpers in
`codex-rs/app-server/src/codex_message_processor.rs` to return `Result`
values for validation and view loading failures.
- Applied the same pattern to thread fork request handling.
- Simplified pending resume error construction by using the shared
JSON-RPC error helpers.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_resume --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_fork --
--test-threads=1`
Records cancelled inference streams when Codex stops consuming a
provider response before `response.completed`, preserving complete
output items observed before cancellation.
Also closes still-running inference calls when the owning turn ends, so
reduced rollout traces do not leave stale `Running` inference nodes.
Covered by focused reducer coverage and a core stream-drop test for
partial output preservation.
## Why
The thread read/list handlers mostly assemble views, but their error
handling was interleaved with response emission. Returning view-building
errors from the helper path keeps those handlers focused on data
assembly.
## What Changed
- Added a small mapper for `ThreadReadViewError` to JSON-RPC errors in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Streamlined thread list, loaded-thread, read, turn-list, and summary
handlers to produce result values for the request boundary.
- Kept the existing invalid-request vs internal-error distinctions for
missing or unreadable thread data.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all conversation_summary --
--test-threads=1`
**note**: a large chunk of this diff comes from regenerating Python
types after app-server schema changes on `main`.
This is PR 3 of 3 for the Python SDK PyPI publishing split. PR #18862
refreshed the generated SDK surface, and PR #18865 made the runtime
package publishable as `openai-codex-cli-bin`; this final PR makes the
SDK package publishable as `openai-codex-app-server-sdk` and pins both
packages to the same Codex runtime version.
The key idea is that the published SDK version is the Codex runtime
version. That one version now drives the SDK package version, the exact
runtime dependency, the client version reported by the SDK, and the
bootstrap runtime pin. This keeps release-time versioning in one lane
instead of scattering checked-in literals through the package.
## What changed
- Rename the SDK distribution from `codex-app-server-sdk` to
`openai-codex-app-server-sdk` for conflict-free PyPI publishing.
- Use `stage-sdk --codex-version ...` with one Codex version for both
the SDK package version and exact `openai-codex-cli-bin` dependency.
- Preserve hidden legacy `--runtime-version` / `--sdk-version` args only
to reject mismatched versions during staging.
- Map PEP 440 package versions back to Codex release tags for runtime
setup downloads, e.g. `0.116.0a1` -> `rust-v0.116.0-alpha.1`.
- Derive `codex_app_server.__version__`, the default
`AppServerConfig.client_version`, and
`_runtime_setup.pinned_runtime_version()` from the SDK package/project
version instead of hardcoding duplicate version strings.
- Carry the current generated SDK refresh from `main` so
`generate-types` stays clean after recent app-server schema changes.
- Update `sdk/python/uv.lock` for the renamed editable package.
## Validation
- `uv run --extra dev pytest` in `sdk/python` -> 59 passed, 37 skipped.
- Targeted `uv run ruff check` for the touched SDK files.
- `git diff --check`.
- Staged runtime with `--codex-version rust-v0.116.0-alpha.1
--platform-tag macosx_11_0_arm64`.
- Staged SDK with `--codex-version rust-v0.116.0-alpha.1`.
- Built runtime wheel, SDK wheel, and SDK sdist.
- `twine check /tmp/codex-python-pr3-build/dist/*` -> passed.
- Clean venv smoke installed `openai-codex-app-server-sdk==0.116.0a1`
from local dist and pulled `openai-codex-cli-bin==0.116.0a1`.
- Smoke imports passed for `Codex` and `bundled_codex_path()`.
## Summary
- shard `//codex-rs/exec:exec-all-test` into 8 Bazel shards
- keep the existing `no-sandbox` test tag unchanged
## Why
The Windows Bazel lane has been timing out this aggregated integration
test target at the default 300s test timeout. The target runs the
combined `codex-rs/exec/tests/all.rs` integration binary; sharding lets
Bazel split the Rust test cases across parallel test actions instead of
running the whole integration suite as one long action.
## Validation
Not run locally, per the Codex repo workflow for development-phase
changes.
Co-authored-by: Codex <noreply@openai.com>
## Why
Thread mutation handlers had many short error branches whose only job
was to emit a JSON-RPC error and stop. This slice keeps those errors
visible, but lets each handler build a result and return early from
validation helpers instead of nesting the main path.
## What Changed
- Streamlined thread archive/unarchive, rename, memory, metadata,
rollback, compact, background terminal, shell, and guardian handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Reused shared JSON-RPC error constructors in
`codex-rs/app-server/src/bespoke_event_handling.rs` for rollback-related
request failures.
- Preserved direct `send_error` calls where they remain the simplest
boundary for pending async event responses.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_rollback --
--test-threads=1`
### Summary
- `thread/list` filtered filesystem results already overlay state DB
metadata, but the existing merge only filled missing git fields.
- Prefer non-null SQLite git metadata over stale non-null rollout values
so persisted branch/SHA/origin updates are reflected in filtered thread
lists.
- Update the focused merge test to cover stale filesystem git metadata
being replaced by state-backed values.
### Testing
now getting expected icons
<img width="426" height="913" alt="Screenshot 2026-04-27 at 1 45 45 PM"
src="https://github.com/user-attachments/assets/027fb7e7-f54d-4353-8423-cb76f3c8f5ac"
/>
## Why
The thread start handler mixed request validation, thread construction,
dynamic-tool validation, and JSON-RPC error emission in one nested flow.
Returning request errors from the helper path makes the successful setup
path easier to follow.
## What Changed
- Reworked `thread/start` handling in
`codex-rs/app-server/src/codex_message_processor.rs` so helper methods
return `Result` and the handler emits one result.
- Moved dynamic-tool validation failures into returned JSON-RPC errors
instead of local `send_error` branches.
- Preserved the existing thread creation and task-spawning behavior.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::dynamic_tools --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
## Why
The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.
## What changed
- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.
## Compatibility
Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.
## Follow-up
This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.
## Verification
- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
## Why
Fixes#19702.
The TUI markdown renderer could visually attach the next list marker to
a fenced code block inside the previous list item, even when the source
markdown included a blank line before the next item. That made
block-heavy loose lists harder to read, while the desired behavior is
still to keep simple lists compact.
## What changed
- Track whether the current rendered list item contains a code block.
- Preserve one blank separator before the following list marker only
when the previous item contained a code block.
- Add regression coverage for both paths: code-block list items keep the
separator, and simple loose list items stay compact.
## Verification
- `cargo test -p codex-tui markdown_render`
I also manually verified that the bug exists before and is fixed after.
## Before
<img width="437" height="240" alt="Screenshot 2026-04-26 at 1 19 01 PM"
src="https://github.com/user-attachments/assets/3bc9d64d-2dba-40d9-9d6b-a1d0b3c0f728"
/>
## After
<img width="410" height="269" alt="Screenshot 2026-04-26 at 1 18 54 PM"
src="https://github.com/user-attachments/assets/19c15bee-da32-455e-a7cb-e05eb85f4ea0"
/>
## Why
Fixes#7744. Approval modals can currently appear while the user is
typing ahead in the TUI composer, which lets plain letters like `y` or
`a` get consumed as approval shortcuts instead of staying in the draft
input.
## What changed
- Track recent composer typing activity in `bottom_pane/mod.rs`.
- Delay new approval overlays for 1 second while the composer is active,
keeping delayed requests queued until the user is idle.
- Preserve the existing active-overlay behavior so approvals that arrive
while an approval modal is already open are still queued into that
overlay.
- Prune delayed approvals when app-server resolution says the request
has already been handled.
## Verification
Added unit coverage for immediate approvals, delayed approvals, idle
deadline reset, typed shortcut letters staying in the composer, shortcut
handling after the delay, and resolved delayed-request pruning.
Focused `codex-tui` test groups pass locally. The full `cargo test -p
codex-tui` run currently aborts in
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`;
that same test also fails when run alone with the same stack overflow.
Manual reviewer check:
1. Start the TUI from the repo root:
```bash
RUST_LOG=trace just codex \
-c log_dir=<temp-log-dir> \
--ask-for-approval untrusted \
--sandbox workspace-write
```
2. Submit this prompt:
```text
create a file text.txt on my desktop
```
3. While the agent is preparing the approval request, immediately type
text such as `ya this should stay in the composer`.
4. Confirm the typed-ahead `y`/`a` remains in the composer instead of
approving the request.
5. Stop typing for about 1 second; the approval modal should then
appear.
6. Once the modal is visible, press `y` and confirm the approval
shortcut works normally.
## Why
`codex resume` regressed after
[#18502](https://github.com/openai/codex/pull/18502) changed the default
`thread/list` scan-and-repair path for metadata-filtered listings. The
TUI resume picker uses `thread/list` with source/provider/cwd filters
and `useStateDbOnly: false`, which is the intended
correctness-preserving mode: it should still consult the filesystem so
healthy, missing, or stale SQLite state can be repaired.
The regression was that #18502 made that filtered, filesystem-backed
path call `reconcile_rollout` for every filesystem hit, and then call it
again for each SQLite hit. When `reconcile_rollout` does not already
have extracted rollout items, it falls back to loading the full JSONL
rollout. That changed the resume picker’s first page from a cheap
rollout-head scan plus SQLite read-repair into full-file reads for large
sessions, so a few long threads could dominate TUI startup/resume
latency.
This change addresses the regression by keeping `useStateDbOnly: false`
on the correctness-preserving path while avoiding unnecessary full JSONL
reads for rows the filesystem scan has already validated.
Source/provider/cwd filters can be decided from rollout-head metadata,
so non-search resume listings only need the lightweight read-repair path
for filesystem hits. Full reconciliation is still used for DB-only
filtered rows because those can be stale false positives, and for search
listings because search can depend on title metadata that may require
scanning the full rollout.
This fixes#19483.
## What changed
- For non-search filtered listings, repair filesystem hits with the
lightweight `read_repair_rollout_path` path instead of full
`reconcile_rollout`.
- Track thread IDs proven by the filesystem scan and only fully
reconcile SQLite-filtered hits that the filesystem scan did not return,
preserving stale-DB false-positive cleanup without full-reading every
healthy rollout.
- Leave search listings on full reconciliation, since search depends on
full title metadata rather than only source/provider/cwd metadata from
the rollout head.
## Verification
- `cargo test -p codex-rollout list_threads`
- `cargo test -p codex-app-server thread_list`
Clamp original-detail image patch estimates to the current 10k patch
budget so large images cannot inflate local context accounting without
bound. Add regression coverage for an over-budget image.
Fixesopenai/codex#19806.
fixes#19486
### Problem
Right now dynamic deferred tools are filtered at normal-turn prompt
building time, rather than upstream while building the `ToolRouter`
itself. This causes issues because dynamic deferred tools are then
wrongly included in the router's `model_visible_specs`, which is what
the compaction request-building flow relies on.
### Fix
Move the dynamic deferred tool filtering to `ToolRouter` creation time
to solve this problem for every request that relies on `ToolRouter` for
`model_visible_specs`, which solves the issue generically.
### Tests
Added unit + integration tests to ensure dynamic deferred tools are
omitted from `model_visible_specs` and compaction request respectively.
Tested against live `/compact` endpoint; raw deferred dynamic tools
without `tool_search` returned `400` (current bug), while the filtered
payload (this fix) returns `200`.
## Why
Account login/logout and command exec handlers were doing local error
sends in the middle of each handler. That made these request flows
branch heavily even though most of the logic is validate, perform the
operation, and return the response.
## What Changed
- Converted ChatGPT/API-key login, login cancel, logout, rate-limit, and
add-credit handlers in
`codex-rs/app-server/src/codex_message_processor.rs` to compute `Result`
values and send them once at the request boundary.
- Applied the same shape to command exec start/write/resize/terminate
handlers.
- Kept side-effect notifications in the same places after successful
request handling.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::account --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::command_exec --
--test-threads=1`
## Why
All Bazel CI jobs are currently blocked in the `setup-bazelisk` step
while trying to download Bazelisk.
[`bazelbuild/setup-bazelisk`](https://github.com/bazelbuild/setup-bazelisk)
is archived, and its README now recommends migrating to
[`bazel-contrib/setup-bazel`](https://github.com/bazel-contrib/setup-bazel),
so leaving our workflows on the archived action leaves CI exposed to
exactly this sort of outage.
Because `v8-canary` now consumes the shared local `setup-bazel-ci`
action, that workflow also needs to trigger when the action changes.
Without that follow-up, Bazel bootstrap regressions specific to the V8
canary path could be skipped by the workflow path filters.
## What Changed
- Switched `.github/actions/setup-bazel-ci/action.yml` from
`bazelbuild/setup-bazelisk` to `bazel-contrib/setup-bazel`, pinned to
`0.19.0`.
- Left `bazelisk-version` unset so GitHub-hosted runners can use their
preinstalled Bazelisk instead of downloading `1.x` at job start.
- Updated `.github/workflows/rusty-v8-release.yml` and
`.github/workflows/v8-canary.yml` to use the shared `setup-bazel-ci`
action instead of referencing `setup-bazelisk` directly.
- Added `.github/actions/setup-bazel-ci/**` to the `pull_request` and
`push` path filters in `.github/workflows/v8-canary.yml` so changes to
the shared Bazel setup action still run the canary workflow.
- Kept the existing repository-cache and Windows-specific Bazel setup
logic intact.
This keeps Bazel version selection anchored by `.bazelversion` while
removing the failing dependency on the archived setup action.
## Verification
- Searched `.github/` to confirm there are no remaining `setup-bazelisk`
references.
- Parsed the updated workflow and action YAML locally with Ruby's
`YAML.load_file`.
## Why
The `build-test` workflow stages a representative `codex` npm tarball by
asking `scripts/stage_npm_packages.py` to look up a past `rust-release`
run for a hardcoded release version. That started failing in CI because
the representative version in `.github/workflows/ci.yml` was stale:
- the workflow was still using `0.115.0`
- `stage_npm_packages.py` resolves native artifacts by looking for a
`rust-release` run on the `rust-v<version>` branch
- that lookup no longer found a matching run for `rust-v0.115.0`, so the
smoke test failed before it could stage the package
This PR makes that smoke test depend on a known-good recent release run
instead of an older branch lookup that is no longer reliable.
## What Changed
- Updated the representative release version in
`.github/workflows/ci.yml` from `0.115.0` to `0.125.0`.
- Added an explicit `WORKFLOW_URL` pointing at a recent successful
`rust-release` run:
`https://github.com/openai/codex/actions/runs/24901475298`.
- Passed that URL to `scripts/stage_npm_packages.py` via
`--workflow-url` so the job can reuse the expected native artifacts
directly instead of relying on `gh run list --branch rust-v<version>` to
discover them.
That keeps the npm staging smoke test representative while making it
less sensitive to older release branch history disappearing from the
GitHub Actions lookup path.
## Verification
- Inspected the failing CI log from `build-test` and confirmed the
failure came from `scripts/stage_npm_packages.py` being unable to
resolve `rust-v0.115.0`.
- Confirmed that
`https://github.com/openai/codex/actions/runs/24901475298` is a
successful `rust-release` run for `rust-v0.125.0`.
## Summary
Auth loading used to expose synchronous construction helpers in several
places even though some auth sources now need async work. This PR makes
the auth-loading surface async and updates the callers to await it.
This is intentionally only plumbing. It does not change how
AgentIdentity tokens are decoded, how task runtime ids are allocated, or
how JWT signatures are verified.
## Stack
1. **This PR:** [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762)
2. [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)
## Important call sites
| Area | Change |
| --- | --- |
| `codex-login` auth loading | `CodexAuth` and `AuthManager`
construction paths now await auth loading. |
| app-server startup | Auth manager construction is awaited during
initialization. |
| CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
await the same behavior. |
| cloud requirements storage loader | The loader becomes async so it can
share the same auth construction path. |
| auth tests | Tests that load auth now run in async contexts. |
## Testing
Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
## Why
The plugin, app, and skills handlers had a lot of repeated
`send_error`/`return` branches that made the success path hard to scan.
This slice keeps behavior the same while moving fallible steps into
local response-producing helpers, so the request boundary can send one
result.
## What Changed
- Converted plugin list/install/uninstall handlers in
`codex-rs/app-server/src/codex_message_processor/plugins.rs` to return
`Result<*Response, JSONRPCErrorError>` from helper methods and call
`send_result` once.
- Added local error-mapping helpers for plugin install/uninstall and
marketplace failures.
- Applied the same mechanical shape to app list, skills list/config, and
marketplace add/remove/upgrade handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
## Verification
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::app_list --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::plugin_ --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::skills_list --
--test-threads=1`
## Why
Fixes#19632.
When a delegated agent requests approval for an in-progress file change,
the parent TUI handles that request from an inactive thread. The app
server already sent the `FileChange` item with the proposed diff, but
the inactive-thread approval path was not recovering and rendering it
the same way as the active-thread path.
The result was an inconsistent approval prompt: main-thread edits show a
normal patch preview history item before the approval modal, while
delegated edits did not show that preview in the transcript flow.
## What Changed
- Recover buffered or historical `FileChange` item changes when building
inactive-thread file-change approval requests.
- Reuse the app-server file-change conversion helper for both live
transcript rendering and inactive-thread approvals.
- Render recovered delegated patches as a normal patch preview history
cell before the approval modal.
- Keep apply-patch approval modals focused on the decision prompt and
optional metadata; they do not render a synthetic command line or embed
the diff body.
## Manual Repro And Verification
I manually reproduced the issue using a file under `~/Desktop` so the
write would require approval.
Before the fix:
1. Ask the main thread: `Use apply_patch, not shell redirection or
Python, to create ~/Desktop/bug1.txt with three short lines.`
2. Observe the expected TUI shape: the transcript shows a normal patch
preview such as `• Added ~/Desktop/bug1.txt (+N -0)` above the approval
modal, and the modal contains only the approval prompt/options without a
synthetic command line.
3. Ask for the delegated path: `Spawn a worker. Have it use apply_patch,
not shell redirection or Python, to create ~/Desktop/bug1.txt with four
short lines.`
4. Observe the delegated approval is inconsistent: the parent view does
not render the proposed patch as the normal transcript preview before
the modal, so the diff context is missing from the stream or appears
inside the modal instead of in the history flow.
After the fix:
1. Repeat the delegated worker prompt with `apply_patch`.
2. Confirm the parent view renders the same normal patch preview history
cell (`• Added ~/Desktop/bug1.txt (+N -0)` plus the diff) immediately
before the approval modal.
3. Confirm the approval modal remains focused on the decision prompt.
For delegated approvals it may show the worker thread label, but it
should not show a `$ apply_patch` command line or embed the diff body in
the modal.
## Why
`!` shell commands are currently surfaced as "Bash mode", which is
misleading for users running shells such as PowerShell or zsh. Those
commands also bypass the persistent prompt history path, so they cannot
be recalled after starting a new session.
Fixes#19613.
## What changed
- Rename the TUI footer label and related test wording from "Bash mode"
to "Shell mode".
- Persist accepted `!` shell commands to prompt history with the leading
`!`, so recall restores the composer into shell mode across sessions.
- Add coverage for immediate and queued shell-command submissions
emitting the prompt-history update.
## Verification
- `cargo test -p codex-tui bang_shell`
- `cargo test -p codex-tui shell_command_uses_shell_accent_style`
- `cargo test -p codex-tui footer_mode_snapshots`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
Manually verified fix after confirming presence of bug prior to fix.
## Why
Fixes#19508.
In a fresh TUI session, pressing `Esc` twice entered the rewind
transcript overlay even though there was no user message to rewind to.
That produced an empty header-only transcript view and exposed a rewind
flow that could not select a valid target.
## What changed
The backtrack flow now checks whether a user-message rewind target
exists before opening the transcript preview. If no target exists, Codex
stays in the main TUI and shows `No previous message to edit.` instead
of opening an empty overlay.
The same guard applies when starting rewind preview from the transcript
overlay, and the first `Esc` no longer advertises the “edit previous
message” hint when there is no previous message available.
Snapshot coverage was added for the unavailable rewind info message,
along with a small target-detection test.
## Why
Phase 2 can now claim the global consolidation lock on startup even when
the git-backed memory workspace is already clean. The clean-workspace
path still finalized through the normal Phase 2 success path, which
clears and re-marks `selected_for_phase2` rows. That made no-op startups
perform avoidable writes to `stage1_outputs`, creating unnecessary DB
I/O and contention when no memory files changed.
## What Changed
- Added a preserving-selection Phase 2 finalizer in `codex-state` that
only marks the global job row as succeeded.
- Kept the existing `mark_global_phase2_job_succeeded` behavior for real
consolidation runs, where the selected Phase 2 snapshot must be
rewritten.
- Switched the `succeeded_no_workspace_changes` branch in
`core/src/memories/phase2.rs` to use the preserving-selection finalizer.
- Added a regression test that installs a SQLite trigger on
`stage1_outputs` and verifies the clean finalizer performs zero updates
there.
## Testing
- `cargo test -p codex-state`
- `cargo test -p codex-core memories::tests::phase2`
## Why
The Phase 2 memories job row is only the global lock for the git-backed
memory workspace. Manual memory edits do not enqueue new Stage 1 work,
so a Phase 2 row with `retry_remaining = 0` could be skipped before the
worker ever claimed the lock and generated `phase2_workspace_diff.md`.
That left workspace-only changes unconsolidated after repeated failures,
even when retry backoff had elapsed and the filesystem had real diffable
work.
## What Changed
- Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after
the retry budget is exhausted, while still respecting active `retry_at`
backoff and fresh running leases.
- Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and
update the outcome docs to match.
- Clamp Phase 2 retry bookkeeping at zero when failed attempts are
recorded.
## Verification
- Added
`phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to
cover the exhausted-budget lock claim path.
- Ran `cargo test -p codex-state`.
## Why
This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc
The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
As a follow-up we will need to add support for user's edition while
`morpheus` is running
## What Changed
- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.
## Verification
- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`features.multi_agent_v2.max_concurrent_threads_per_session` is meant to
be the MultiAgentV2-specific session thread cap: it counts the root
thread and all open subagent threads. The previous implementation kept
this surface tied to `agents.max_threads`, which made it a global
subagent-only cap and allowed the legacy setting to coexist with
MultiAgentV2.
## What Changed
- Added `max_concurrent_threads_per_session` to
`[features.multi_agent_v2]` with default `4`.
- Removed the `[agents] max_concurrent_threads_per_session` alias to
`agents.max_threads`.
- When MultiAgentV2 is enabled, reject `agents.max_threads` and derive
the existing internal subagent slot limit as
`max_concurrent_threads_per_session - 1`.
- Regenerated `core/config.schema.json` and added coverage for the new
config semantics.
## Result
```
➜ codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3
╭────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0) │
│ │
│ model: gpt-5.5 xhigh fast /model to change │
│ directory: ~/code/codex │
╰────────────────────────────────────────────────────╯
Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager.
› Can you try to spawn 4 agents
• I’ll try to start four lightweight agents at once and report exactly what the runtime accepts.
• Spawned Russell [no-apps] (gpt-5.5 xhigh)
└ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Spawned Descartes [no-apps] (gpt-5.5 xhigh)
└ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Agent spawn failed
└ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Agent spawn failed
└ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed.
```
---------
Co-authored-by: Codex <noreply@openai.com>
Problem: Maintainers need a shared way to run Codex GitHub issue digests
without copying large prompts or relying on manual GitHub page
summaries.
Solution: Add a reusable codex-issue-digest skill with a deterministic
GitHub collector, owner/all-label windows, reaction-aware activity
metrics, scaled attention markers, and focused tests.
## Why
After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.
## What Changed
- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.
## Verification
- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
## Why
Auto-review can deny an action that the user later decides they want to
retry. Today there is no TUI surface for selecting a recent denial and
sending explicit approval context back into the session, so users have
to restate intent manually and the retry can be reviewed without the
original denied action context.
This adds a narrow TUI-driven path for approving a recent denied action
while still keeping the retry inside the normal auto-review flow.
## What Changed
- Added `/auto-review-denials` to open a picker of recent denied
auto-review actions.
- Added a small in-memory TUI store for the 10 most recent denied
auto-review events.
- Selecting a denial sends the structured denied event back through the
existing core/app-server op path.
- Core now injects a developer message containing the approved action
JSON rather than the full assessment event.
- Auto-review transcript collection now preserves this specific approval
developer message so follow-up review sessions can see the user approval
context.
- Added TUI snapshot/unit coverage for the picker and approval dispatch
path.
- Added core coverage for retaining the approval developer message in
the auto-review transcript.
## Verification
- `cargo test -p codex-core
collect_guardian_transcript_entries_keeps_manual_approval_developer_message`
- `cargo test -p codex-tui auto_review_denials`
- `cargo test -p codex-tui
approving_recent_denial_emits_structured_core_op_once`
## Notes
This intentionally keeps retries going through auto-review. The approval
signal is context for the exact previously denied action, not a blanket
bypass for similar future actions.
## Why
The remaining migration work still needs `SandboxPolicy` at a few
compatibility boundaries, but those projections should come from one
canonical path. Keeping ad hoc legacy projections scattered through
app-server, CLI, and config code makes it easy for behavior to drift as
`PermissionProfile` gains fidelity that the legacy enum cannot
represent.
## What Changed
- Adds `Permissions::legacy_sandbox_policy(cwd)` and
`Config::legacy_sandbox_policy()` as the compatibility projection from
the canonical `PermissionProfile`.
- Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs
are checked after they are converted into profile semantics.
- Updates app-server command handling, Windows sandbox setup, session
configuration, and sandbox summaries to use the centralized projection
helper.
- Leaves `SandboxPolicy` in place only for boundary inputs/outputs that
still speak the legacy abstraction.
## Verification
- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-tui
permissions_selection_history_snapshot_full_access_to_default --
--nocapture`
- `cargo test -p codex-tui
permissions_selection_sends_approvals_reviewer_in_override_turn_context
-- --nocapture`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_history_snapshot_full_access_to_default
--test_output=errors`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context
--test_output=errors`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734).
* #19737
* #19736
* #19735
* __->__ #19734
# Why
Requirements support host-specific
`remote_sandbox_config.hostname_patterns`, but config loading previously
resolved and passed the system hostname through every config-loading
path even when no requirements layer used `remote_sandbox_config`. On
machines where hostname lookup is slow, startup and app-server config
reads paid for a feature that was not active.
We only need the hostname when a requirements layer actually declares
`remote_sandbox_config`, so this moves hostname resolution to the single
requirements merge point and keeps all other config callers unaware of
hostname matching.
# What
- Removed the eager `host_name` plumbing from
`load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`,
app-server `ConfigManager`, network proxy loading, and related call
sites.
- Resolve the hostname inside
`merge_requirements_with_remote_sandbox_config` only when the incoming
requirements contain `remote_sandbox_config`.
## Why
Several execution paths still converted profile-backed permissions into
`SandboxPolicy` and then rebuilt runtime permissions from that legacy
shape. Those round trips are unnecessary after the preceding PRs and can
lose split filesystem semantics. Core approval and escalation should
carry the resolved profile directly.
## What Changed
- Removes `sandbox_policy` from `ResolvedPermissionProfile`; the
resolved permission object now carries the canonical `PermissionProfile`
directly.
- Updates exec-policy fallback, shell/unified-exec interception,
escalation reruns, and related tests to pass profiles instead of legacy
policies.
- Removes legacy additional-permission merge helpers that built an
effective `SandboxPolicy` before rebuilding runtime permissions.
- Keeps legacy projections only at compatibility boundaries that still
require `SandboxPolicy`, not in core permission computation.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394).
* #19737
* #19736
* #19735
* #19734
* #19395
* __->__ #19394
## Summary
Increase `core-all-test`'s Bazel shard count from `8` to `16`.
## Why
[#19609](https://github.com/openai/codex/pull/19609) restored
`bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s
shard count because the bigger timeout risk was not just a cold Windows
build. The more common problem was a long `rust_test()` shard failing
and getting retried multiple times.
Recent `main` runs show that `//codex-rs/core:core-all-test` still has
the same shape of problem on Windows:
- [Run
24943931330](https://github.com/openai/codex/actions/runs/24943931330)
reported `//codex-rs/core:core-all-test` as flaky after first-attempt
failures in shard `5/8` and shard `8/8`.
- Those retries were driven by
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
and
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`.
- The failed shard attempts in that run took `272.61s` and `259.27s`
before retrying, which is exactly the sort of wall-clock cost that burns
through the 30-minute budget.
- [Run
24966332583](https://github.com/openai/codex/actions/runs/24966332583)
also retried `//codex-rs/tui:tui-unit-tests` after
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
failed once on Windows.
- [Run
24965527138](https://github.com/openai/codex/actions/runs/24965527138)
and its linked [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce)
show the other half of the problem: when Windows cache reuse is weak,
the `bazel test //...` step can already consume `24m11s` on its own,
leaving very little headroom for flaky retries.
Increasing `core-all-test` to `16` shards does not fix the flaky tests,
but it does reduce the wall-clock cost when a single shard has to be
retried. That matches the mitigation we already applied to
`app-server-all-test` in `#19609`.
## What Changed
- Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards
instead of `8`.
- Leave `core-unit-tests` unchanged.
## Follow-up Work
This change is meant to buy back CI headroom while we fix the flaky
tests themselves in subsequent commits. The recent Windows retries that
look worth addressing directly include:
-
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
-
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`
-
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
## Verification
- Compared `core-all-test`'s current sharding against the
`app-server-all-test` precedent in
[#19609](https://github.com/openai/codex/pull/19609).
- Inspected recent `main` Bazel workflow logs and the linked BuildBuddy
invocation to confirm that Windows retries on long shards are still
consuming a meaningful fraction of the 30-minute timeout budget.
- Did not run local tests for this change because it only adjusts Bazel
sharding metadata.
## Why
Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.
## What Changed
- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
## Why
Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.
To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.
This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.
## What Changed
- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`
## Notes
- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
## Why
App-server request handling had a lot of repeated JSON-RPC error
construction and one-off `send_error`/`return` branches. This made small
handlers noisy and pushed error response details into leaf code that
otherwise only needed to validate input or call the underlying API.
## What Changed
- Added shared JSON-RPC error constructors in
`codex-rs/app-server/src/error_code.rs`.
- Lifted straightforward request result emission into
`codex-rs/app-server/src/message_processor.rs` so response/error
dispatch happens at the request boundary.
- Reused the result helpers across command exec, config, filesystem,
device-key, external-agent config, fs-watch, and outgoing-message paths.
- Removed leaf wrapper handlers where the method body was only
forwarding to a response helper.
- Returned request validation errors upward in the simple cases instead
of sending an error locally and immediately returning.
## Verification
- `cargo test -p codex-app-server --lib command_exec::tests`
- `cargo test -p codex-app-server --lib outgoing_message::tests`
- `cargo test -p codex-app-server --lib in_process::tests`
- `cargo test -p codex-app-server --test all v2::fs`
- `cargo test -p codex-app-server --test all v2::config_rpc`
- `cargo test -p codex-app-server --test all v2::external_agent_config`
- `cargo test -p codex-app-server --test all v2::initialize`
- `just fix -p codex-app-server`
- `git diff --check`
Note: full `cargo test -p codex-app-server` was attempted and stopped in
`message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
with a stack overflow after unrelated tests had already passed.
## Why
After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.
## What Changed
- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
## Why
This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.
`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.
The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.
## What Changed
- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
## Summary
This PR lets programmatic AgentIdentity users provide one token through
either stdin login or environment auth.
`codex login --with-agent-identity` reads an Agent Identity JWT from
stdin, validates that it has the required claims, and stores that token
as the `agent_identity` value in `auth.json`. The file format is
token-only; the decoded account and key fields are runtime state, not
hand-authored auth.json fields.
The Agent Identity JWT claim shape and decoder live in
`codex-agent-identity`; `codex-login` only owns env/storage precedence
and conversion into `CodexAuth::AgentIdentity`.
When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
both env vars are set.
Reference old stack: https://github.com/openai/codex/pull/17387/changes
Reference JWT/env stack: https://github.com/openai/codex/pull/18176
## Stack
1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. This PR: accept AgentIdentity JWTs through login/env
## Testing
Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
formatter/linter cleanup, and CI.
---------
Co-authored-by: Shijie Rao <shijie.rao@openai.com>
## Why
Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.
A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.
## What Changed
- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.
## Verification
- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
This removes the hidden `codex responses` CLI subcommand after
confirming no downstream callers rely on it, deleting the raw Responses
passthrough implementation, unregistering the subcommand, and dropping
the now-unused CLI dependencies on `codex-api` and
`codex-model-provider`.
Some providers of Responses API forward a model-defined `end_turn`
boolean indicating explicitly the model's indication of whether it would
like to end the turn or to be inferenced again. In this PR, we update
the sampling loop to use this field correctly if it's set. If the field
is not set by the provider, we fall back to the existing sampling logic.
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.
## Why
Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.
This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.
## Stack
- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support
## What Changed
- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.
## Runtime Bounds
Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.
## Validation
- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.
## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
## Why
Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.
That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.
This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.
## Failure Modes Covered
- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.
## What Changed
- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.
## Behavior
This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.
## Verification
Targeted local checks run while extracting this fix:
- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`
Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.
## Docs
No docs update is needed because this is test-only infrastructure
stabilization.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
## Why
`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.
This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.
## Security impact
Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.
## What changed
- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.
## Verification
- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
## Why
Fixes#19499.
The slash-command popup recalculated the command-name column from only
the rows visible in the current viewport. That made the description
column shift horizontally while scrolling through `/` commands whenever
longer command names entered or left the visible window.
## What Changed
`codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
selection-popup `AutoAllRows` column-width mode for both height
measurement and rendering. This keeps the command description column
based on the full filtered slash-command list instead of the current
viewport.
## Verification
- `cargo test -p codex-tui bottom_pane::command_popup`
Follow-up to #19266.
## Why
`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:
```diff
{
+ ".tmp",
"config.toml",
"installation_id",
"memories",
"skills",
}
```
That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).
That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.
## What changed
- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced
## Verification
- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
Fixes#15219.
## Why
`thread/resume` should continue a persisted thread with the same model
provider that created the thread. The app server already restores the
persisted model and reasoning effort before resuming, but it was leaving
`model_provider` unset. If a user created a thread with one provider and
later switched their active profile to another provider, resumed
encrypted history could be sent to the wrong endpoint and fail with
`invalid_encrypted_content`.
The thread metadata already records the original provider, so resume
should apply it when the caller has not explicitly requested a different
model/provider/reasoning configuration.
## What changed
This updates `merge_persisted_resume_metadata` in
`app-server/src/codex_message_processor.rs` to copy
`ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
alongside the persisted model.
The existing resume metadata tests now also assert that:
- the persisted provider is restored for normal resume
- explicit model, provider, or reasoning-effort overrides still prevent
persisted resume metadata from being applied
- a thread with no persisted model or reasoning effort still resumes
with its persisted provider
## Verification
- `cargo test -p codex-app-server` passed the app-server unit tests,
including the updated resume metadata coverage. The broader integration
portion of that command failed in an unrelated environment-sensitive
skills-budget warning assertion, where this run saw 8 omitted skills
instead of the expected 7.
- `just fix -p codex-app-server` completed successfully.
Unfortunately, if most of the build graph is invalidated such that there
are few cache hits, the Windows Bazel build for all the tests often
takes more than `30` minutes, so this PR increases the timeout to `45`
minutes until we set up distributed builds.
## Why
The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
but several files still made reviewers read private support machinery
before the public or crate-facing entry points. This ordering pass makes
each file easier to scan: exported API first, crate-visible MCP
internals next, then private helpers in breadth-first order from the
higher-level MCP flows to leaf utilities.
## What Changed
- Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
auth, and helper surfaces are grouped by visibility and reader
importance.
- Moved public and crate-visible MCP items ahead of private helpers in
the auth, MCP planning/snapshot, connection manager, and tool-name
modules.
- Kept the change mechanical, with no behavior changes intended.
## Verification
- `cargo check -p codex-mcp`
## Why
`codex-mcp` currently exposes more API than the rest of the workspace
uses. Some of that surface is simply visibility that can be tightened,
and some of it is public helper code that remains compiler-valid because
it is exported even though no workspace caller uses it.
That distinction matters: Rust does not warn on exported API just
because the current workspace does not call it. This PR intentionally
treats those exported-but-workspace-unreferenced paths as stale
`codex-mcp` surface. The main example is MCP skill dependency
collection, where the active implementation now lives in
`codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
`codex-mcp` copy makes it unclear which implementation owns skill MCP
installation.
## What Changed
- Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
- Removed non-runtime helper methods from `McpConnectionManager` so it
stays focused on live MCP clients.
- Made `ToolPluginProvenance` lookup methods crate-private.
- Removed workspace-unreferenced snapshot wrapper APIs and
qualified-tool grouping helpers.
- Deleted the duplicate `codex-mcp` skill dependency module and tests
now that skill MCP dependency handling is owned by `core`.
## Verification
- `cargo check -p codex-mcp`
2026-04-25 06:36:07 -07:00
927 changed files with 76176 additions and 32700 deletions
description: Run a GitHub issue digest for openai/codex by feature-area labels, all areas, and configurable time windows. Use when asked to summarize recent Codex bug reports or enhancement requests, especially for owner-specific labels such as tui, exec, app, or similar areas.
---
# Codex Issue Digest
## Objective
Produce a headline-first, insight-oriented digest of `openai/codex` issues for the requested feature-area labels over the previous 24 hours by default. Honor a different duration when the user asks for one, for example "past week" or "48 hours". Default to a summary-only response; include details only when requested.
Include only issues that currently have `bug` or `enhancement` plus at least one requested owner label. If the user asks for all areas or all labels, collect `bug`/`enhancement` issues across all labels.
## Inputs
- Feature-area labels, for example `tui exec`
-`all areas` / `all labels` to scan all current feature labels
Use `--window "past week"` or `--window-hours 168` when the user asks for a non-default duration. Use `--all-labels` when the user says all areas or all labels.
2. Use the JSON as the source of truth. It includes new issues, new issue comments, new reactions/upvotes, current labels, current reaction counts, model-ready `summary_inputs`, and detailed `digest_rows`.
3. Choose the output mode from the user's request:
- Default mode: start the report with `## Summary` and do not emit `## Details`.
- Details-upfront mode: if the user asks for details, a table, a full digest, "include details", or similar, start with `## Summary`, then include `## Details`.
- Follow-up details mode: if the user asks for more detail after a summary-only digest, produce `## Details` from the existing collector JSON when it is still available; otherwise rerun the collector.
4. In `## Summary`, write a headline-first executive summary:
- The first nonblank line under `## Summary` must be a single-line headline or judgment, not a bullet. It should be useful even if the reader stops there.
- On quiet days, prefer exactly: `No major issues reported by users.` Use this when there are no elevated rows, no newly repeated theme, and nothing that needs owner action.
- When users are surfacing notable issues, make the headline name the count or theme, for example `Two issues are being surfaced by users:`.
- Immediately under an active headline, list only the issues or themes driving attention, ordered by importance. Start each line with the row's `attention_marker` when present, then a concise owner-readable description and inline issue refs.
- Treat `🔥🔥` as headline-worthy and `🔥` as elevated. Do not add fire emoji yourself; only copy the row's `attention_marker`.
- Keep any extra summary detail after the headline to 1-3 terse lines, only when it adds a decision-relevant caveat, repeated theme, or owner action.
- Do not include routine counts, broad stats, or low-signal table summaries in `## Summary` unless they change the headline. Put metadata and optional counts in `## Details` or the footer.
- In default mode, end the report with a concise prompt such as `Want details? I can expand this into the issue table.` Keep this separate from the summary headline so the headline stays clean.
- Cluster and name themes yourself from `summary_inputs`; the collector intentionally does not hard-code issue categories.
- Use a cluster only when the issues genuinely share the same product problem. If several issues merely share a broad platform or label, describe them individually.
- Do not omit a repeated theme just because its individual issues fall below the details table cutoff. Several similar reports should be called out as a repeated customer concern.
- For single-issue rows, summarize the concern directly instead of calling it a cluster.
- Use inline numbered issue links from each relevant row's `ref_markdown`.
- Example quiet summary:
```markdown
## Summary
No major issues reported by users.
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
- Example active summary:
```markdown
## Summary
Two issues are being surfaced by users:
🔥🔥 Terminal launch hangs on startup [1](https://github.com/openai/codex/issues/123)
🔥 Resume switches model providers unexpectedly [2](https://github.com/openai/codex/issues/456)
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
5. In `## Details`, when details are requested, include a compact table only when useful:
- Prefer rows from `digest_rows`; include a `Refs` column using each row's `ref_markdown`.
- Keep the table short; omit low-signal rows when the summary already covers them.
- Use compact columns such as marker, area, type, description, interactions, and refs.
- The `Description` cell should be a short owner-readable phrase. Use row `description`, title, body excerpts, and recent comments, but do not mechanically copy the raw GitHub issue title when it contains incidental details.
- A clear quiet/no-concern sentence when there is no meaningful signal.
6. Use the JSON `attention_marker` exactly. It is empty for normal rows, `🔥` for elevated rows, and `🔥🔥` for very high-attention rows. The actual cutoffs are in `attention_thresholds`.
7. Use inline numbered references where a row or bullet points to issues, for example `Compaction bugs [1](https://github.com/openai/codex/issues/123), [2](https://github.com/openai/codex/issues/456)`. Do not add a separate footnotes section.
8. Label `interactions` as `Interactions`; it counts posts/comments/reactions during the requested window, not unique people.
9. Mention the collector `script_version`, repo checkout `git_head`, and time window in one compact source line. In default mode, put this before the details prompt so the final line still asks whether the user wants details. In details-upfront mode, it can be the footer.
## Reaction Handling
The collector uses GitHub reactions endpoints, which include `created_at`, to count reactions created during the digest window for hydrated issues. It reports both in-window reaction counts and current reaction totals. Treat current reaction totals as standing engagement, and treat `new_reactions` / `new_upvotes` as windowed activity.
By default, the collector fetches issue comments with `since=<window start>` and caps the number of comment pages per issue. This keeps very long historical threads from dominating a digest run and focuses the report on recent posts. Use `--fetch-all-comments` only when exhaustive comment history is more important than runtime.
GitHub issue search is still seeded by issue `updated_at`, so a purely reaction-only issue may be missed if reactions do not bump `updated_at`. Covering every reaction-only case would require either a persisted snapshot store or a broader scan of labeled issues.
## Attention Markers
The collector scales attention markers by the requested time window. The baseline is 5 human user interactions for `🔥` and 10 for `🔥🔥` over 24 hours; longer or shorter windows scale those cutoffs linearly and round up. For example, a one-week report uses 35 and 70 interactions. Human user interactions are human-authored new issue posts, human-authored new comments, and human reactions created during the window, including upvotes. Bot posts and bot reactions are excluded. In prose, explain this as high user interaction rather than naming the emoji.
## Freshness
The automation should run from a repo checkout that contains this skill. For shared daily use, prefer one of these patterns:
- Run the automation in a checkout that is refreshed before the automation starts, for example with `git pull --ff-only`.
- If the automation cannot safely mutate the checkout, have it report the current `git_head` from the collector output so readers know which skill/script version produced the digest.
## Sample Owner Prompt
```text
Use $codex-issue-digest to run the Codex issue digest for labels tui and exec over the previous 24 hours.
```
```text
Use $codex-issue-digest to run the Codex issue digest for all areas over the past week.
"Issues are selected when they currently have bug or enhancement plus at least one requested owner label and were updated during the window.",
"By default, issue comments are fetched with since=window_start and a max page cap to avoid long historical threads; use --fetch-all-comments when exhaustive comment history is needed.",
"New issue comments are filtered by comment creation time within the window from the fetched comment set.",
"Reaction events are counted by GitHub reaction created_at timestamps for hydrated issues and fetched comments.",
"Current reaction totals are standing engagement signals; new_reactions and new_upvotes are windowed activity.",
"The collector does not assign semantic clusters; use summary_inputs as model-ready evidence for report-time clustering.",
"Pure reaction-only issues may be missed if GitHub issue search does not surface them via updated_at.",
"Issues updated during the window without a new issue body or new comment are retained because label/status edits can still be useful owner signals.",
@@ -19,6 +19,12 @@ In the codex-rs folder where the rust code lives:
- You can run `just argument-comment-lint` to run the lint check locally. This is powered by Bazel, so running it the first time can be slow if Bazel is not warmed up, though incremental invocations should take <15s. Most of the time, it is best to update the PR and let CI take responsibility for checking this (or run it asynchronously in the background after submitting the PR). Note CI checks all three platforms, which the local run does not.
- When possible, make `match` statements exhaustive and avoid wildcard arms.
- Newly added traits should include doc comments that explain their role and how implementations are expected to use them.
- Discourage both `#[async_trait]` and `#[allow(async_fn_in_trait)]` in Rust traits.
- Prefer native RPITIT trait methods with explicit `Send` bounds on the returned future, as in `3c7f013f9735` / `#16630`.
To try a writable legacy sandbox mode with these commands, pass an explicit config override such
as `-c 'sandbox_mode="workspace-write"'`.
### Selecting a sandbox policy via `--sandbox`
The Rust CLI exposes a dedicated `--sandbox` (`-s`) flag that lets you pick the sandbox policy **without** having to reach for the generic `-c/--config` option:
"description":"Optional full permissions profile for this command.\n\nDefaults to the user's configured permissions when omitted. Cannot be combined with `sandboxPolicy`."
},
"processId":{
"description":"Optional client-supplied, connection-scoped process id.\n\nRequired for `tty`, `streamStdin`, `streamStdoutStderr`, and follow-up `command/exec/write`, `command/exec/resize`, and `command/exec/terminate` calls. When omitted, buffered execution gets an internal id that is not exposed to the client.",
"description":"Optional full permissions profile for this command.\n\nDefaults to the user's configured permissions when omitted. Cannot be combined with `sandboxPolicy`."
},
"processId":{
"description":"Optional client-supplied, connection-scoped process id.\n\nRequired for `tty`, `streamStdin`, `streamStdoutStderr`, and follow-up `command/exec/write`, `command/exec/resize`, and `command/exec/terminate` calls. When omitted, buffered execution gets an internal id that is not exposed to the client.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"PluginAuthPolicy":{
"enum":[
"ON_INSTALL",
"ON_USE"
],
"type":"string"
},
"PluginInstallPolicy":{
"enum":[
"NOT_AVAILABLE",
"AVAILABLE",
"INSTALLED_BY_DEFAULT"
],
"type":"string"
},
"PluginInterface":{
"properties":{
"brandColor":{
"type":[
"string",
"null"
]
},
"capabilities":{
"items":{
"type":"string"
},
"type":"array"
},
"category":{
"type":[
"string",
"null"
]
},
"composerIcon":{
"anyOf":[
{
"$ref":"#/definitions/AbsolutePathBuf"
},
{
"type":"null"
}
],
"description":"Local composer icon path, resolved from the installed plugin package."
},
"composerIconUrl":{
"description":"Remote composer icon URL from the plugin catalog.",
"type":[
"string",
"null"
]
},
"defaultPrompt":{
"description":"Starter prompts for the plugin. Capped at 3 entries with a maximum of 128 characters per entry.",
"items":{
"type":"string"
},
"type":[
"array",
"null"
]
},
"developerName":{
"type":[
"string",
"null"
]
},
"displayName":{
"type":[
"string",
"null"
]
},
"logo":{
"anyOf":[
{
"$ref":"#/definitions/AbsolutePathBuf"
},
{
"type":"null"
}
],
"description":"Local logo path, resolved from the installed plugin package."
},
"logoUrl":{
"description":"Remote logo URL from the plugin catalog.",
"type":[
"string",
"null"
]
},
"longDescription":{
"type":[
"string",
"null"
]
},
"privacyPolicyUrl":{
"type":[
"string",
"null"
]
},
"screenshotUrls":{
"description":"Remote screenshot URLs from the plugin catalog.",
"items":{
"type":"string"
},
"type":"array"
},
"screenshots":{
"description":"Local screenshot paths, resolved from the installed plugin package.",
"items":{
"$ref":"#/definitions/AbsolutePathBuf"
},
"type":"array"
},
"shortDescription":{
"type":[
"string",
"null"
]
},
"termsOfServiceUrl":{
"type":[
"string",
"null"
]
},
"websiteUrl":{
"type":[
"string",
"null"
]
}
},
"required":[
"capabilities",
"screenshotUrls",
"screenshots"
],
"type":"object"
},
"PluginSource":{
"oneOf":[
{
"properties":{
"path":{
"$ref":"#/definitions/AbsolutePathBuf"
},
"type":{
"enum":[
"local"
],
"title":"LocalPluginSourceType",
"type":"string"
}
},
"required":[
"path",
"type"
],
"title":"LocalPluginSource",
"type":"object"
},
{
"properties":{
"path":{
"type":[
"string",
"null"
]
},
"refName":{
"type":[
"string",
"null"
]
},
"sha":{
"type":[
"string",
"null"
]
},
"type":{
"enum":[
"git"
],
"title":"GitPluginSourceType",
"type":"string"
},
"url":{
"type":"string"
}
},
"required":[
"type",
"url"
],
"title":"GitPluginSource",
"type":"object"
},
{
"description":"The plugin is available in the remote catalog. Download metadata is kept server-side and is not exposed through the app-server API.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"ActivePermissionProfile":{
"properties":{
"extends":{
"default":null,
"description":"Parent profile identifier once permissions profiles support inheritance. This is currently always `null`.",
"type":[
"string",
"null"
]
},
"id":{
"description":"Identifier from `default_permissions` or the implicit built-in default, such as `:workspace` or a user-defined `[permissions.<id>]` profile.",
"type":"string"
},
"modifications":{
"default":[],
"description":"Bounded user-requested modifications applied on top of the named profile, if any.",
"description":"Canonical active permissions view for this thread."
},
"reasoningEffort":{
"anyOf":[
{
@@ -2528,7 +2554,7 @@
"$ref":"#/definitions/SandboxPolicy"
}
],
"description":"Legacy sandbox policy retained for compatibility. New clients should use `permissionProfile` when present as the canonical active permissions view."
"description":"Legacy sandbox policy retained for compatibility. Experimental clients should prefer `permissionProfile` when they need exact runtime permissions."
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"ActivePermissionProfile":{
"properties":{
"extends":{
"default":null,
"description":"Parent profile identifier once permissions profiles support inheritance. This is currently always `null`.",
"type":[
"string",
"null"
]
},
"id":{
"description":"Identifier from `default_permissions` or the implicit built-in default, such as `:workspace` or a user-defined `[permissions.<id>]` profile.",
"type":"string"
},
"modifications":{
"default":[],
"description":"Bounded user-requested modifications applied on top of the named profile, if any.",
"description":"Canonical active permissions view for this thread."
},
"reasoningEffort":{
"anyOf":[
{
@@ -2528,7 +2554,7 @@
"$ref":"#/definitions/SandboxPolicy"
}
],
"description":"Legacy sandbox policy retained for compatibility. New clients should use `permissionProfile` when present as the canonical active permissions view."
"description":"Legacy sandbox policy retained for compatibility. Experimental clients should prefer `permissionProfile` when they need exact runtime permissions."
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"ActivePermissionProfile":{
"properties":{
"extends":{
"default":null,
"description":"Parent profile identifier once permissions profiles support inheritance. This is currently always `null`.",
"type":[
"string",
"null"
]
},
"id":{
"description":"Identifier from `default_permissions` or the implicit built-in default, such as `:workspace` or a user-defined `[permissions.<id>]` profile.",
"type":"string"
},
"modifications":{
"default":[],
"description":"Bounded user-requested modifications applied on top of the named profile, if any.",
"description":"Canonical active permissions view for this thread."
},
"reasoningEffort":{
"anyOf":[
{
@@ -2528,7 +2554,7 @@
"$ref":"#/definitions/SandboxPolicy"
}
],
"description":"Legacy sandbox policy retained for compatibility. New clients should use `permissionProfile` when present as the canonical active permissions view."
"description":"Legacy sandbox policy retained for compatibility. Experimental clients should prefer `permissionProfile` when they need exact runtime permissions."
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.