Fixes stale test fixtures left after the active bundled model catalog
updates in #18586 and #18388. Those changes made `gpt-5.4` the current
default and removed several older hardcoded slugs, which left Windows
Bazel shards failing TUI and config tests.
What changed:
- Refresh TUI model migration, availability NUX, plan-mode, status, and
snapshot fixtures to use active bundled model slugs.
- Update the config edit test expectation for the TOML-quoted
`"gpt-5.2"` migration key.
- Move the model catalog tests into
`codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not
trip the blob-size policy for `app.rs`.
Verification:
- CI Bazel/lint checks are expected to cover the affected test shards.
## Summary
This PR aims to improve integration between the realtime model and the
codex agent by sharing more context with each other. In particular, we
now share full realtime conversation transcript deltas in addition to
the delegation message.
realtime_conversation.rs now turns a handoff into:
```
<realtime_delegation>
<input>...</input>
<transcript_delta>...</transcript_delta>
</realtime_delegation>
```
## Implementation notes
The transcript is accumulated in the realtime websocket layer as parsed
realtime events arrive. When a background-agent handoff is requested,
the current transcript snapshot is copied onto the handoff event and
then serialized by `realtime_conversation.rs` into the hidden realtime
delegation envelope that Codex receives as user-turn context.
For Realtime V2, the session now explicitly enables input audio
transcription, and the parser handles the relevant input/output
transcript completion events so the snapshot includes both user speech
and realtime model responses. The delegation `<input>` remains the
actual handoff request, while `<transcript_delta>` carries the
surrounding conversation history for context.
Reviewers should note that the transcript payload is intended for Codex
context sharing, not UI rendering. The realtime delegation envelope
should stay hidden from the user-facing transcript surface, while still
being included in the background-agent turn so Codex can answer with the
same conversational context the realtime model had.
## Why
Guardian review analytics needs a Rust event shape that matches the
backend schema while avoiding unnecessary PII exposure from reviewed
tool calls. This PR narrows the analytics payload to the fields we
intend to emit and keeps shared Guardian assessment enums in protocol
instead of duplicating equivalent analytics-only enums.
## What changed
- Uses protocol Guardian enums directly for `risk_level`,
`user_authorization`, `outcome`, and command source values.
- Removes high-risk reviewed-action fields from the analytics payload,
including raw commands, display strings, working directories, file
paths, network targets/hosts, justification text, retry reason, and
rationale text.
- Makes `target_item_id` and `tool_call_count` nullable so the Codex
event can represent cases where the app-server protocol or producer does
not have those values.
- Keeps lower-risk structured reviewed-action metadata such as sandbox
permissions, permission profile, `tty`, `execve` source/program, network
protocol/port, and MCP connector/tool labels.
- Adds an analytics reducer/client test covering `codex_guardian_review`
serialization with an optional `target_item_id` and absent removed
fields.
## Verification
- `cargo test -p codex-analytics
guardian_review_event_ingests_custom_fact_with_optional_target_item`
- `cargo fmt --check`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17692).
* #17696
* #17695
* #17693
* __->__ #17692
Wires patch_updated events through app_server. These events are parsed
and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.
The eventual goal is to use this to display better progress indicators in
the codex app.
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
This is the second cleanup in the await-holding lint stack. The
higher-level goal, following https://github.com/openai/codex/pull/18178
and https://github.com/openai/codex/pull/18398, is to enable Clippy
coverage for guards held across `.await` points without carrying broad
suppressions.
The stack is working toward enabling Clippy's
[`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
lint and the configurable
[`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
lint for Tokio guard types.
Several existing fields used `tokio::sync::Mutex<()>` only as
one-at-a-time async gates. Those guards intentionally lived across
`.await` while an operation was serialized. A mutex over `()` suggests
protected data and trips the await-holding lint shape; a single-permit
`tokio::sync::Semaphore` expresses the intended serialization directly.
## What changed
- Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for
agent identity ensure, exec policy updates, guardian review session
reuse, plugin remote sync, managed network proxy refresh, auth token
refresh, and RMCP session recovery.
- Update call sites from `lock().await` / `try_lock()` to
`acquire().await` / `try_acquire()`.
- Map closed-semaphore errors into the existing local error types, even
though these semaphores are owned for the lifetime of their managers.
- Update session test builders for the new
`managed_network_proxy_refresh_lock` type.
## Verification
- The split stack was verified at the final lint-enabling head with
`just clippy`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403).
* #18698
* #18423
* #18418
* __->__ #18403
## Why
`PermissionProfile` needs stable, canonical file-system semantics before
it can become the primary runtime permissions abstraction. Without a
canonical form, callers have to keep re-deriving legacy sandbox maps and
profile comparisons remain lossy or order-dependent.
## What changed
This adds canonicalization helpers for `FileSystemPermissions` and
`PermissionProfile`, expands special paths into explicit sandbox
entries, and updates permission request/conversion paths to consume
those canonical entries. It also tightens the legacy bridge so root-wide
write profiles with narrower carveouts are not silently projected as
full-disk legacy access.
## Verification
- `cargo test -p codex-protocol
root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
- `cargo test -p codex-sandboxing permission -- --nocapture`
- `cargo test -p codex-tui permissions -- --nocapture`
## Why
Fresh app-server thread startup can create a shell snapshot through a
temp file and then promote it to the final snapshot path. The previous
implementation briefly wrapped the temp path in `ShellSnapshot`, so
after a successful rename its `Drop` attempted to delete the old temp
path and could log a false `ENOENT` warning.
Fixes#17549.
## What changed
- Validate the temp snapshot path directly before promotion.
- Rename the temp path directly to the final snapshot path.
- Keep explicit cleanup of the temp path on validation or finalization
failures.
## Summary
Introduces a single background/control-plane agent task for ChatGPT
backend requests that do not have a thread-scoped task, with
`AuthManager` owning the default ChatGPT backend authorization decision.
Callers now ask `AuthManager` for the default ChatGPT backend
authorization header. `AuthManager` decides whether that is bearer or
background AgentAssertion based on config/internal state, while
low-level bootstrap paths can explicitly request bearer-only auth.
This PR is stacked on PR4 and focuses on the shared background task auth
plumbing plus the first tranche of backend/control-plane consumers. The
remaining callsite wiring is split into PR4.2 to keep review size down.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use task-scoped
`AgentAssertion` for downstream calls
- PR4.1: this PR - introduce AuthManager-owned background/control-plane
`AgentAssertion` auth
- PR4.2: https://github.com/openai/codex/pull/18260 - use background
task auth for additional backend/control-plane calls
## What Changed
- add background task registration and assertion minting inside
`codex-login`
- persist `agent_identity.background_task_id` separately from
per-session task state
- make `BackgroundAgentTaskManager` private to `codex-login`; call sites
do not instantiate or pass it around
- teach `AuthManager` the ChatGPT backend base URL and feature-derived
background auth mode from resolved config
- expose bearer-only helpers for bootstrap/registration/refresh-style
paths that must not use AgentAssertion
- wire `AuthManager` default ChatGPT authorization through app listing,
connector directory listing, remote plugins, MCP status/listing,
analytics, and core-skills remote calls
- preserve bearer fallback when the feature is disabled, the backend
host is unsupported, or background task registration is not available
## Validation
- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-login agent_identity`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
The initial goal of this PR was to stabilise the test
`fs_watch_allows_missing_file_targets`. After further investigation, it
turns out that this test was always failing and the unstability was
coming from a race between timeouts mostly
The goal of the test was to test what happens if a notifier gets
subscribed while a file does not exist yet. But actually the main code
was broken and in case of a file not existing yet, the notifier used to
never notify anything (even if the file ended up being created)
This PR fixes the main code (and the test). For this, we basically watch
the sup-directory when a file does not exist and refresh on it when the
files gets created
Make the morpheus agent (which is the phase 2 memories agent) follow the
agent-v2 path system by naming it `/morpheus`. To maintain the path
primitive this means moving it to a dedicated `AgentControl`
Co-authored-by: Codex <noreply@openai.com>
## Summary
This is the AgentAssertion downstream slice for feature-gated agent
identity support, replacing the oversized AgentAssertion slice from PR
#17807.
It isolates task-scoped downstream AgentAssertion wiring on top of the
merged PR3.1 work without re-carrying the earlier agent registration,
task registration, or task-state history.
This PR includes the task-scoped bug-fix call sites from the review:
generic file upload auth, MCP OpenAI file upload auth, and ARC monitor
auth. Broader user/control-plane calls move to PR4.1 and PR4.2.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: this PR - use task-scoped `AgentAssertion` downstream when
enabled
- PR4.1: https://github.com/openai/codex/pull/18094 - introduce
AuthManager-owned background/control-plane `AgentAssertion` auth
- PR4.2: https://github.com/openai/codex/pull/18260 - use background
task auth for additional backend/control-plane calls
## What Changed
- add AgentAssertion envelope generation in `codex-core`
- route downstream HTTP and websocket auth through AgentAssertion when
an agent task is present
- extend the model-provider auth provider so non-bearer authorization
schemes can be passed through cleanly
- make generic file uploads attach the full authorization header value
- make MCP OpenAI file uploads use the cached thread agent task
assertion when present
- make ARC monitor calls use the cached thread agent task assertion when
present
## Why
The original PR had drifted ancestry and showed a much larger diff than
the semantic change actually required. Restacking it onto PR3.1 keeps
the reviewable surface down to the downstream assertion slice.
## Validation
- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `cargo test -p codex-login agent_identity`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
## Summary
Third PR in the split from #17956. Stacked on #18220.
- shows workspace-owner/member-specific rate-limit messages behind
`workspace_owner_usage_nudge`
- prompts workspace members to notify the owner or request a usage-limit
increase
- sends the confirmed nudge through the app-server API and renders
completion feedback
- adds focused TUI snapshot coverage for prompts and completion states
- feature gate
## Validation
- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
## Summary
Remove the skills message from the guardian dev message
## Test Plan
- [x] Ran locally
- [x] Added unit test
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- persist registered agent tasks in the session state update stream so
the thread can reuse them
- prewarm task registration once identity registration succeeds, while
keeping startup failures best-effort
- isolate the session-side task lifecycle into a dedicated module so
AgentIdentityManager and RegisteredAgentTask do not leak across as many
core layers
## Testing
- cargo test -p codex-core startup_agent_task_prewarm
- cargo test -p codex-core
cached_agent_task_for_current_identity_clears_stale_task
- cargo test -p codex-core record_initial_history_
## Summary
- Add the executor-backed RMCP stdio transport.
- Wire MCP stdio placement through the executor environment config.
- Cover local and executor-backed stdio paths with the existing MCP test
helpers.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
@ #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
o #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Reverts PR #17749 so queued inter-agent mail can again preempt after
reasoning/commentary output item boundaries.
- Applies the revert to the current `codex/turn.rs` module layout and
restores the prior pending-input test expectations/snapshots.
## Testing
- `just fmt`
- `cargo test -p codex-core --test all pending_input`
- `cargo test -p codex-core` failed in unrelated
`tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`:
dotslash download hit `mktemp: mkdtemp failed ... Operation not
permitted` in the sandbox temp dir.
Co-authored-by: Codex <noreply@openai.com>
Adds max_context_window to model metadata and routes core context-window
reads through resolved model info. Config model_context_window overrides
are clamped to max_context_window when present; without an override, the
model context_window is used.
## Summary
Move the marketplace remove implementation into shared core logic so
both the CLI command and follow-up app-server RPC can reuse the same
behavior.
This change:
- adds a shared `codex_core::plugins::remove_marketplace(...)` flow
- moves validation, config removal, and installed-root deletion out of
the CLI
- keeps the CLI as a thin wrapper over the shared implementation
- adds focused core coverage for the shared remove path
## Validation
- `just fmt`
- focused local coverage for the shared remove path
- heavier follow-up validation deferred to stacked PR CI
## Summary
- Populate `PluginDetail.description` in core for uninstalled cross-repo
plugins when detailed fields are unavailable until install.
- Include the source Git URL plus optional path/ref/sha details in that
fallback description.
- Keep `details_unavailable_reason` as the structured signal while
app-server forwards the description normally.
- Add plugin-read coverage proving the response does not clone the
remote source just to show the message.
## Why
Uninstalled cross-repo plugins intentionally return sparse detail data
so listing/reading does not clone the plugin source. Without a
description, Desktop and TUI detail pages look like an ordinary empty
plugin. This gives users a concrete explanation and source pointer while
keeping the existing structured reason available for callers.
## Validation
- `just fmt`
- `cargo test -p codex-core
read_plugin_for_config_uninstalled_git_source_requires_install_without_cloning`
- `cargo test -p codex-app-server plugin_read --test all`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
Note: `cargo test -p codex-app-server` was also attempted before the
latest refactor and failed broadly in unrelated v2
thread/realtime/review/skills suites; the new plugin-read test passed in
that run as well.
Cap the model-visible skills section to a small share of the context
window, with a fallback character budget, and keep only as many implicit
skills as fit within that budget.
Emit a non-fatal warning when enabled skills are omitted, and add a new
app-server warning notification
Record thread-start skill metrics for total enabled skills, kept skills,
and whether truncation happened
---------
Co-authored-by: Matthew Zeng <mzeng@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
- trust-gate project `.codex` layers consistently, including repos that
have `.codex/hooks.json` or `.codex/execpolicy/*.rules` but no
`.codex/config.toml`
- keep disabled project layers in the config stack so nested trusted
project layers still resolve correctly, while preventing hooks and exec
policies from loading until the project is trusted
- update app-server/TUI onboarding copy to make the trust boundary
explicit and add regressions for loader, hooks, exec-policy, and
onboarding coverage
## Security
Before this change, an untrusted repo could auto-load project hooks or
exec policies from `.codex/` as long as `config.toml` was absent. This
makes trust the single gate for project-local config, hooks, and exec
policies.
## Stack
- Parent of #15936
## Test
- cargo test -p codex-core without_config_toml
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Update the plugin API for the new remote plugin model.
The mental model is no longer “keep local plugin state in sync with
remote.” Instead, local and remote plugins are becoming separate
sources. Remote catalog entries can be shown directly from the remote
API before installation; after installation they are still downloaded
into the local cache for execution, but remote installed state will come
from the API and be held in memory rather than being read from config.
• ## API changes
- Remove `forceRemoteSync` from `plugin/list`, `plugin/install`, and
`plugin/uninstall`.
- Remove `remoteSyncError` from `plugin/list`.
- Add remote-capable metadata to `plugin/list` / `plugin/read`:
- nullable `marketplaces[].path`
- `source: { type: "remote", downloadUrl }`
- URL asset fields alongside local path fields:
`composerIconUrl`, `logoUrl`, `screenshotUrls`
- Make `plugin/read` and `plugin/install` source-compatible:
- `marketplacePath?: AbsolutePathBuf | null`
- `remoteMarketplaceName?: string | null`
- exactly one source is required at runtime
## Why
The large Rust test suites are slow and include some of our flakiest
tests, so we want to run them with Bazel native sharding while keeping
shard membership stable between runs.
This is the simpler follow-up to the explicit-label experiment in
#17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which
includes the stable test-name hashing support from
hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel
macros into that support.
Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's
per-shard test action caching. Using stable name hashing avoids
reshuffling every test when one test is added or removed.
## What Changed
`codex_rust_crate` now accepts `test_shard_counts` and applies the right
Bazel/rules_rust attributes to generated unit and integration test
rules. Matched tests are also marked `flaky = True`, giving them Bazel's
default three attempts.
This PR shards these labels 8 ways:
```text
//codex-rs/core:core-all-test
//codex-rs/core:core-unit-tests
//codex-rs/app-server:app-server-all-test
//codex-rs/app-server:app-server-unit-tests
//codex-rs/tui:tui-unit-tests
```
## Verification
`bazel query --output=build` over the selected public labels and their
inner unit-test binaries confirmed the expected `shard_count = 8`,
`flaky = True`, and `experimental_enable_sharding = True` attributes.
Also verified that we see the shards as expected in BuildBuddy so they
can be analyzed independently.
Co-authored-by: Codex <noreply@openai.com>
## Summary
- pass split filesystem sandbox policy/cwd through apply_patch contexts,
while omitting legacy-equivalent policies to keep payloads small
- keep the fs helper compatible with legacy Landlock by avoiding helper
read-root permission expansion in that mode and disabling helper network
access
## Root Cause
`d626dc38950fb40a1a5ad0a8ffab2485e3348c53` routed exec-server filesystem
operations through a sandboxed helper. That path forwarded legacy
Landlock into a helper policy shape that could require direct
split-policy enforcement. Sandboxed `apply_patch` hit that edge through
the filesystem abstraction.
The same 0.121 edit-regression path is consistent with #18354: normal
writes route through the `apply_patch` filesystem helper, fail under
sandbox, and then surface the generic retry-without-sandbox prompt.
Fixes#18069Fixes#18354
## Validation
- `cd codex-rs && just fmt`
- earlier branch validation before merging current `origin/main` and
dropping the now-separate PATH fix:
- `cd codex-rs && cargo test -p codex-exec-server`
- `cd codex-rs && cargo test -p codex-core file_system_sandbox_context`
- `cd codex-rs && just fix -p codex-exec-server`
- `cd codex-rs && just fix -p codex-core`
- `git diff --check`
- `cd codex-rs && cargo clean`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add first-class marketplace support for git-backed plugin sources
- keep the newer marketplace parsing behavior from `main`, including
alternate manifest locations and string local sources
- materialize remote plugin sources during install, detail reads, and
non-curated cache refresh
- expose git plugin source metadata through the app-server protocol
## Details
This teaches the marketplace parser to accept all of the following:
- local string sources such as `"source": "./plugins/foo"`
- local object sources such as
`{"source":"local","path":"./plugins/foo"}`
- remote repo-root sources such as
`{"source":"url","url":"https://github.com/org/repo.git"}`
- remote subdir sources such as
`{"source":"git-subdir","url":"owner/repo","path":"plugins/foo","ref":"main","sha":"..."}`
It also preserves the newer tolerant behavior from `main`: invalid or
unsupported plugin entries are skipped instead of breaking the whole
marketplace.
## Validation
- `cargo test -p codex-core plugins::marketplace::tests`
- `just fix -p codex-core`
- `just fmt`
## Notes
- A full `cargo test -p codex-core` run still hit unrelated existing
failures in agent and multi-agent tests during this session; the
marketplace-focused suite passed after the rebase resolution.
Follow-up to https://github.com/openai/codex/pull/18178, where we called
out enabling the await-holding lint as a follow-up.
The long-term goal is to enable Clippy coverage for async guards held
across awaits. This PR is intentionally only the first, low-risk cleanup
pass: it narrows obvious lock guard lifetimes and leaves
`codex-rs/Cargo.toml` unchanged so the lint is not enabled until the
remaining cases are fixed or explicitly justified. It intentionally
leaves the active-turn/turn-state locking pattern alone because those
checks and mutations need to stay atomic.
## Common fixes used here
These are the main patterns reviewers should expect in this PR, and they
are also the patterns to reach for when fixing future `await_holding_*`
findings:
- **Scope the guard to the synchronous work.** If the code only needs
data from a locked value, move the lock into a small block, clone or
compute the needed values, and do the later `.await` after the block.
- **Use direct one-line mutations when there is no later await.** Cases
like `map.lock().await.remove(&id)` are acceptable when the guard is
only needed for that single mutation and the statement ends before any
async work.
- **Drain or clone work out of the lock before notifying or awaiting.**
For example, the JS REPL drains pending exec senders into a local vector
and the websocket writer clones buffered envelopes before it serializes
or sends them.
- **Use a `Semaphore` only when serialization is intentional across
async work.** The test serialization guards intentionally span awaited
setup or execution, so using a semaphore communicates "one at a time"
without holding a mutex guard.
- **Remove the mutex when there is only one owner.** The PTY stdin
writer task owns `stdin` directly; the old `Arc<Mutex<_>>` did not
protect shared access because nothing else had access to the writer.
- **Do not split locks that protect an atomic invariant.** This PR
deliberately leaves active-turn/turn-state paths alone because those
checks and mutations need to stay atomic. Those cases should be fixed
separately with a design change or documented with `#[expect]`.
## What changed
- Narrow scoped async mutex guards in app-server, JS REPL, network
approval, remote-control websocket, and the RMCP test server.
- Replace test-only async mutex serialization guards with semaphores
where the guard intentionally lives across async work.
- Let the PTY pipe writer task own stdin directly instead of wrapping it
in an async mutex.
## Verification
- `just fix -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness`
- `just clippy -p codex-core`
- `cargo test -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` was
run; the app-server suite passed, and `codex-core` failed in the local
sandbox on six otel approval tests plus
`suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
which appear to depend on local command approval/default rules and
`CODEX_SANDBOX_NETWORK_DISABLED=1` in this environment.
## Summary
First PR in the split from #17956.
- adds the core/app-server `RateLimitReachedType` shape
- maps backend `rate_limit_reached_type` into Codex rate-limit snapshots
- carries the field through app-server notifications/responses and
generated schemas
- updates existing constructors/tests for the new optional field
## Validation
- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
## Summary
- Add a pushed `ExecProcessEvent` stream alongside retained
`process/read` output.
- Publish local and remote output, exit, close, and failure events.
- Cover the event stream with shared local/remote exec process tests.
## Testing
- `cargo check -p codex-exec-server`
- `cargo check -p codex-rmcp-client`
- Not run: `cargo test` per repo instruction; CI will cover.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
o #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
@ #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
## Why
Unused imports in `core/tests/suite/unified_exec.rs` in the Windows
build were not caught by Bazel CI on
https://github.com/openai/codex/pull/18096. I spot-checked
https://github.com/openai/codex/actions/workflows/rust-ci-full.yml?query=branch%3Amain
and noticed that builds were consistently red. This revealed that our
Cargo builds _were_ properly catching these issues, identifying a
Windows-specific coverage hole in the Bazel clippy job.
The Windows Bazel clippy job uses `--skip_incompatible_explicit_targets`
so it can lint a broad target set without failing immediately on targets
that are genuinely incompatible with Windows. However, with the default
Windows host platform, `rust_test` targets such as
`//codex-rs/core:core-all-test` could be skipped before the clippy
aspect reached their integration-test modules. As a result, the imports
in `core/tests/suite/unified_exec.rs` were not being linted by the
Windows Bazel clippy job at all.
The clippy diagnostic that Windows Bazel should have surfaced was:
```text
error: unused import: `codex_config::Constrained`
--> core\tests\suite\unified_exec.rs:8:5
|
8 | use codex_config::Constrained;
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D unused-imports` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unused_imports)]`
error: unused import: `codex_protocol::permissions::FileSystemAccessMode`
--> core\tests\suite\unified_exec.rs:11:5
|
11 | use codex_protocol::permissions::FileSystemAccessMode;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: unused import: `codex_protocol::permissions::FileSystemPath`
--> core\tests\suite\unified_exec.rs:12:5
|
12 | use codex_protocol::permissions::FileSystemPath;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: unused import: `codex_protocol::permissions::FileSystemSandboxEntry`
--> core\tests\suite\unified_exec.rs:13:5
|
13 | use codex_protocol::permissions::FileSystemSandboxEntry;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: unused import: `codex_protocol::permissions::FileSystemSandboxPolicy`
--> core\tests\suite\unified_exec.rs:14:5
|
14 | use codex_protocol::permissions::FileSystemSandboxPolicy;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
## What changed
- Run the Windows Bazel clippy job with the MSVC host platform via
`--windows-msvc-host-platform`, matching the Windows Bazel test job.
This keeps `--skip_incompatible_explicit_targets` while ensuring Windows
`rust_test` targets such as `//codex-rs/core:core-all-test` are still
linted.
- Remove the unused imports from `core/tests/suite/unified_exec.rs`.
- Add `--print-failed-action-summary` to
`.github/scripts/run-bazel-ci.sh` so Bazel action failures can be
summarized after the build exits.
## Failure reporting
Once the coverage issue was fixed, an intentionally reintroduced unused
import made the Windows Bazel clippy job fail as expected. That exposed
a separate usability problem: because the job keeps `--keep_going`, the
top-level Bazel output could still end with:
```text
ERROR: Build did NOT complete successfully
FAILED:
```
without the underlying rustc/clippy diagnostic being visible in the
obvious part of the GitHub Actions log.
To keep `--keep_going` while making failures actionable, the wrapper now
scans the captured Bazel console output for failed actions and prints
the matching rustc/clippy diagnostic block. When a diagnostic block is
found, it is emitted both as a GitHub `::error` annotation and as plain
expanded log output, rather than being hidden in a collapsed group.
## Verification
To validate the CI path, I intentionally introduced an unused import in
`core/tests/suite/unified_exec.rs`. The Windows Bazel clippy job failed
as expected, confirming that the integration-test module is now covered
by Bazel clippy. The same failure also verified that the wrapper
surfaces the matching clippy diagnostics directly in the Actions output.
## Summary
- Normalize deferred MCP and dynamic tools into `ToolSearchEntry` values
before constructing `ToolSearchHandler`.
- Move the tool-search entry adapter out of `tools/handlers` and into
`tools/tool_search_entry.rs` so the handlers directory stays focused on
handlers.
- Keep `ToolSearchHandler` operating over one generic entry list for
BM25 search, namespace grouping, and per-bucket default limits.
## Why
Follow-up cleanup for #17849. The dynamic tool-search support made the
handler juggle source-specific MCP and dynamic tool lists, index
arithmetic, output conversion, and namespace emission. This keeps source
adaptation outside the handler so the search loop itself is smaller and
source-agnostic.
## Validation
- `just fmt`
- `cargo test -p codex-core tools::handlers::tool_search::tests`
- `git diff --check`
- `cargo test -p codex-core` currently fails in unrelated
`plugins::manager::tests::list_marketplaces_ignores_installed_roots_missing_from_config`;
rerunning that single test fails the same way at
`core/src/plugins/manager_tests.rs:1692`.
---------
Co-authored-by: pash <pash@openai.com>