## Why
Clients need a stable app-server protocol surface for enrolling a local
device key, retrieving its public key, and producing a device-bound
proof.
The protocol reports `protectionClass` explicitly so clients can
distinguish hardware-backed keys from an explicitly allowed OS-protected
fallback. Signing uses a tagged `DeviceKeySignPayload` enum rather than
arbitrary bytes so each signed statement is auditable at the API
boundary.
## What changed
- Added v2 JSON-RPC methods for `device/key/create`,
`device/key/public`, and `device/key/sign`.
- Added request/response types for device-key metadata, SPKI public
keys, protection classes, and ECDSA signatures.
- Added `DeviceKeyProtectionPolicy` with hardware-only default behavior
and an explicit `allow_os_protected_nonextractable` option.
- Added the initial `remoteControlClientConnection` signing payload
variant.
- Regenerated JSON Schema and TypeScript fixtures for app-server
clients.
## Stack
This is PR 1 of 4 in the device-key app-server stack.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
## Summary
- Move external agent config migration logic and tests from `codex-core`
into `app-server/src/config`.
- Keep the migration service crate-private to app-server and update the
API adapter imports.
- Remove stale core re-exports and expose only the needed marketplace
source helper.
## Testing
- `cargo test -p codex-app-server config::external_agent_config`
- `just fmt`
- `just fix -p codex-app-server`
- `just fix -p codex-core`
- `git diff --check`
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.
This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.
It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.
Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
## Why
This PR prepares the stack to enable Clippy await-holding lints that
were left disabled in #18178. The mechanical lock-scope cleanup is
handled separately; this PR is the documentation/configuration layer for
the remaining await-across-guard sites.
Without explicit annotations, reviewers and future maintainers cannot
tell whether an await-holding warning is a real concurrency smell or an
intentional serialization boundary.
## What changed
- Configures `clippy.toml` so `await_holding_invalid_type` also covers
`tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
- Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
...)]` annotations for intentional async guard lifetimes.
- Documents the main categories of intentional cases: active-turn state
transitions that must remain atomic, session-owned MCP manager accesses,
remote-control websocket serialization, JS REPL kernel/process
serialization, OAuth persistence, external bearer token refresh
serialization, and tests that intentionally serialize shared global or
session-owned state.
- For external bearer token refresh, documents the existing
serialization boundary: holding `cached_token` across the provider
command prevents concurrent cache misses from starting duplicate refresh
commands, and the current behavior is small enough that an explicit
expectation is easier to maintain than adding another synchronization
primitive.
## Verification
- `cargo clippy -p codex-login --all-targets`
- `cargo clippy -p codex-connectors --all-targets`
- `cargo clippy -p codex-core --all-targets`
- The follow-up PR #18698 enables `await_holding_invalid_type` and
`await_holding_lock` as workspace `deny` lints, so any undocumented
remaining offender will fail Clippy.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
* #18698
* __->__ #18423
## Why
Customers need finer-grained control over allowed sandbox modes based on
the host Codex is running on. For example, they may want stricter
sandbox limits on devboxes while keeping a different default elsewhere.
Our current cloud requirements can target user/account groups, but they
cannot vary sandbox requirements by host. That makes remote development
environments awkward because the same top-level `allowed_sandbox_modes`
has to apply everywhere.
## What
Adds a new `remote_sandbox_config` section to `requirements.toml`:
```toml
allowed_sandbox_modes = ["read-only"]
[[remote_sandbox_config]]
hostname_patterns = ["*.org"]
allowed_sandbox_modes = ["read-only", "workspace-write"]
[[remote_sandbox_config]]
hostname_patterns = ["*.sh", "runner-*.ci"]
allowed_sandbox_modes = ["read-only", "danger-full-access"]
```
During requirements resolution, Codex resolves the local host name once,
preferring the machine FQDN when available and falling back to the
cleaned kernel hostname. This host classification is best effort rather
than authenticated device proof.
Each requirements source applies its first matching
`remote_sandbox_config` entry before it is merged with other sources.
The shared merge helper keeps that `apply_remote_sandbox_config` step
paired with requirements merging so new requirements sources do not have
to remember the extra call.
That preserves source precedence: a lower-precedence requirements file
with a matching `remote_sandbox_config` cannot override a
higher-precedence source that already set `allowed_sandbox_modes`.
This also wires the hostname-aware resolution through app-server,
CLI/TUI config loading, config API reads, and config layer metadata so
they all evaluate remote sandbox requirements consistently.
## Verification
- `cargo test -p codex-config remote_sandbox_config`
- `cargo test -p codex-config host_name`
- `cargo test -p codex-core
load_config_layers_applies_matching_remote_sandbox_config`
- `cargo test -p codex-core
system_remote_sandbox_config_keeps_cloud_sandbox_modes`
- `cargo test -p codex-config`
- `cargo test -p codex-core` unit tests passed; `tests/all.rs`
integration matrix was intentionally stopped after the relevant focused
tests passed
- `just fix -p codex-config`
- `just fix -p codex-core`
- `cargo check -p codex-app-server`
## Summary
Making thread id optional so that we can better cache resources for MCPs
for connectors since their resource templates is universal and not
particular to projects.
- Make `mcpServer/resource/read` accept an optional `threadId`
- Read resources from the current MCP config when no thread is supplied
- Keep the existing thread-scoped path when `threadId` is present
- Update the generated schemas, README, and integration coverage
## Testing
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-mcp`
- `cargo test -p codex-app-server --test all mcp_resource`
- `just fix -p codex-mcp`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
## Why
#18274 made `PermissionProfile` the canonical file-system permissions
shape, but the round-trip from `FileSystemSandboxPolicy` to
`PermissionProfile` still dropped one piece of policy metadata:
`glob_scan_max_depth`.
That field is security-relevant for deny-read globs such as `**/*.env`.
On Linux, bubblewrap sandbox construction uses it to bound unreadable
glob expansion. If a profile copied from active runtime permissions
loses this value and is submitted back as an override, the resulting
`FileSystemSandboxPolicy` can behave differently even though the visible
permission entries look equivalent.
## What changed
- Add `glob_scan_max_depth` to protocol `FileSystemPermissions` and
preserve it when converting to/from `FileSystemSandboxPolicy`.
- Keep legacy `read`/`write` JSON for simple path-only permissions, but
force canonical JSON when glob scan depth is present so the metadata is
not silently dropped.
- Carry `globScanMaxDepth` through app-server
`AdditionalFileSystemPermissions`, generated JSON/TypeScript schemas,
and app-server/TUI conversion call sites.
- Preserve the metadata through sandboxing permission normalization,
merging, and intersection.
- Carry the merged scan depth into the effective
`FileSystemSandboxPolicy` used for command execution, so bounded
deny-read globs reach Linux bubblewrap materialization.
## Verification
- `cargo test -p codex-sandboxing glob_scan -- --nocapture`
- `cargo test -p codex-sandboxing policy_transforms -- --nocapture`
- `just fix -p codex-sandboxing`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18713).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* #18275
* __->__ #18713
## Why
Codex needs a first-class `amazon-bedrock` model provider so users can
select Bedrock without copying a full provider definition into
`config.toml`. The provider has Codex-owned defaults for the pieces that
should stay consistent across users: the display `name`, Bedrock
`base_url`, and `wire_api`.
At the same time, users still need a way to choose the AWS credential
profile used by their local environment. This change makes
`amazon-bedrock` a partially modifiable built-in provider: code owns the
provider identity and endpoint defaults, while user config can set
`model_providers.amazon-bedrock.aws.profile`.
For example:
```toml
model_provider = "amazon-bedrock"
[model_providers.amazon-bedrock.aws]
profile = "codex-bedrock"
```
## What Changed
- Added `amazon-bedrock` to the built-in model provider map with:
- `name = "Amazon Bedrock"`
- `base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"`
- `wire_api = "responses"`
- Added AWS provider auth config with a profile-only shape:
`model_providers.<id>.aws.profile`.
- Kept AWS auth config restricted to `amazon-bedrock`; custom providers
that set `aws` are rejected.
- Allowed `model_providers.amazon-bedrock` through reserved-provider
validation so it can act as a partial override.
- During config loading, only `aws.profile` is copied from the
user-provided `amazon-bedrock` entry onto the built-in provider. Other
Bedrock provider fields remain hard-coded by the built-in definition.
- Updated the generated config schema for the new provider AWS profile
config.
## Why
Cloud-hosted sessions need a way for the service that starts or manages
a thread to provide session-owned config without treating all config as
if it came from the same user/project/workspace TOML stack.
The important boundary is ownership: some values should be controlled by
the session/orchestrator, some by the authenticated user, and later some
may come from the executor. The earlier broad config-store shape made
that boundary too fuzzy and overlapped heavily with the existing
filesystem-backed config loader. This PR starts with the smaller piece
we need now: a typed session config loader that can feed the existing
config layer stack while preserving the normal precedence and merge
behavior.
## What Changed
- Added `ThreadConfigLoader` and related typed payloads in
`codex-config`.
- `SessionThreadConfig` currently supports `model_provider`,
`model_providers`, and feature flags.
- `UserThreadConfig` is present as an ownership boundary, but does not
yet add TOML-backed fields.
- `NoopThreadConfigLoader` preserves existing behavior when no external
loader is configured.
- `StaticThreadConfigLoader` supports tests and simple callers.
- Taught thread config sources to produce ordinary `ConfigLayerEntry`
values so the existing `ConfigLayerStack` remains the place where
precedence and merging happen.
- Wired the loader through `ConfigBuilder`, the config loader, and
app-server startup paths so app-server can provide session-owned config
before deriving a thread config.
- Added coverage for:
- translating typed thread config into config layers,
- inserting thread config layers into the stack at the right precedence,
- applying session-provided model provider and feature settings when
app-server derives config from thread params.
## Follow-Ups
This intentionally stops short of adding the remote/service transport.
The next pieces are expected to be:
1. Define the proto/API shape for this interface.
2. Add a client implementation that can source session config from the
service side.
## Verification
- Added unit coverage in `codex-config` for the loader and layer
conversion.
- Added `codex-core` config loader coverage for thread config layer
precedence.
- Added app-server coverage that verifies session thread config wins
over request-provided config for model provider and feature settings.
## Summary
Adds a second realtime v2 function tool, `remain_silent`, so the
realtime model has an explicit non-speaking action when the
collaboration mode or latest context says it should not answer aloud.
This is stacked on #18597.
## Design
- Advertise `remain_silent` alongside `background_agent` in realtime v2
conversational sessions.
- Parse `remain_silent` function calls into a typed
`RealtimeEvent::NoopRequested` event.
- Have core answer that function call with an empty
`function_call_output` and deliberately avoid `response.create`, so no
follow-up realtime response is requested.
- Keep the event hidden from app-server/TUI surfaces; it is operational
plumbing, not user-visible conversation content.
Migrate the conversation summary App Server methods to ThreadStore
Because this app server api allows explicitly fetching the thread by
rollout path, intercept that case in the app server code and (a) route
directly to underlying local thread store methods if we're using a local
thread store, or (b) throw an unsupported error if we're using a remote
thread store. This keeps the thread store API clean and all filesystem
operations inside of the local thread store, which pushing the
"fundamental incompatibility" check as early as possible.
## Summary
This PR aims to improve integration between the realtime model and the
codex agent by sharing more context with each other. In particular, we
now share full realtime conversation transcript deltas in addition to
the delegation message.
realtime_conversation.rs now turns a handoff into:
```
<realtime_delegation>
<input>...</input>
<transcript_delta>...</transcript_delta>
</realtime_delegation>
```
## Implementation notes
The transcript is accumulated in the realtime websocket layer as parsed
realtime events arrive. When a background-agent handoff is requested,
the current transcript snapshot is copied onto the handoff event and
then serialized by `realtime_conversation.rs` into the hidden realtime
delegation envelope that Codex receives as user-turn context.
For Realtime V2, the session now explicitly enables input audio
transcription, and the parser handles the relevant input/output
transcript completion events so the snapshot includes both user speech
and realtime model responses. The delegation `<input>` remains the
actual handoff request, while `<transcript_delta>` carries the
surrounding conversation history for context.
Reviewers should note that the transcript payload is intended for Codex
context sharing, not UI rendering. The realtime delegation envelope
should stay hidden from the user-facing transcript surface, while still
being included in the background-agent turn so Codex can answer with the
same conversational context the realtime model had.
Wires patch_updated events through app_server. These events are parsed
and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.
The eventual goal is to use this to display better progress indicators in
the codex app.
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
## Why
`PermissionProfile` needs stable, canonical file-system semantics before
it can become the primary runtime permissions abstraction. Without a
canonical form, callers have to keep re-deriving legacy sandbox maps and
profile comparisons remain lossy or order-dependent.
## What changed
This adds canonicalization helpers for `FileSystemPermissions` and
`PermissionProfile`, expands special paths into explicit sandbox
entries, and updates permission request/conversion paths to consume
those canonical entries. It also tightens the legacy bridge so root-wide
write profiles with narrower carveouts are not silently projected as
full-disk legacy access.
## Verification
- `cargo test -p codex-protocol
root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
- `cargo test -p codex-sandboxing permission -- --nocapture`
- `cargo test -p codex-tui permissions -- --nocapture`
- Migrates unloaded `thread/name/set` and `thread/memoryModeSet`
app-server writes behind the generic
`ThreadStore::update_thread_metadata` API rather than adding one-off
store methods for setting thread name or memory mode.
- Implements the local ThreadStore metadata patch path for thread name
and memory mode, including rollout append, legacy name index updates,
SessionMeta validation/update, SQLite reconciliation, and re-reading the
stored thread.
- Adds focused local thread-store unit coverage plus app-server
integration coverage for the migrated unloaded write paths.
## Summary
Introduces a single background/control-plane agent task for ChatGPT
backend requests that do not have a thread-scoped task, with
`AuthManager` owning the default ChatGPT backend authorization decision.
Callers now ask `AuthManager` for the default ChatGPT backend
authorization header. `AuthManager` decides whether that is bearer or
background AgentAssertion based on config/internal state, while
low-level bootstrap paths can explicitly request bearer-only auth.
This PR is stacked on PR4 and focuses on the shared background task auth
plumbing plus the first tranche of backend/control-plane consumers. The
remaining callsite wiring is split into PR4.2 to keep review size down.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use task-scoped
`AgentAssertion` for downstream calls
- PR4.1: this PR - introduce AuthManager-owned background/control-plane
`AgentAssertion` auth
- PR4.2: https://github.com/openai/codex/pull/18260 - use background
task auth for additional backend/control-plane calls
## What Changed
- add background task registration and assertion minting inside
`codex-login`
- persist `agent_identity.background_task_id` separately from
per-session task state
- make `BackgroundAgentTaskManager` private to `codex-login`; call sites
do not instantiate or pass it around
- teach `AuthManager` the ChatGPT backend base URL and feature-derived
background auth mode from resolved config
- expose bearer-only helpers for bootstrap/registration/refresh-style
paths that must not use AgentAssertion
- wire `AuthManager` default ChatGPT authorization through app listing,
connector directory listing, remote plugins, MCP status/listing,
analytics, and core-skills remote calls
- preserve bearer fallback when the feature is disabled, the backend
host is unsupported, or background task registration is not available
## Validation
- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-login agent_identity`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
## Why
This addresses the review comment from #17751 about `marketplace/remove`
app-server test portability:
https://github.com/openai/codex/pull/17751#discussion_r3104378613
The API returns the removed installed root using the app-server's
effective `CODEX_HOME`. On macOS, temporary directory paths can appear
as either `/var/...` or `/private/var/...`, so comparing one raw path
against another can fail even when `marketplace/remove` behaves
correctly.
## What changed
- Removed the direct whole-response equality assertion for the installed
root path.
- Asserted the stable response field, `marketplace_name`, directly.
- Compared the expected and returned installed-root paths after
canonicalizing their existing parent directories, which avoids requiring
the removed leaf directory to still exist.
## Verification
- `cargo test -p codex-app-server
marketplace_remove_deletes_config_and_installed_root`
- `cargo test -p codex-app-server marketplace_remove`
## Summary
Add a new app-server `marketplace/remove` RPC on top of the shared
marketplace-remove implementation.
This change:
- adds `MarketplaceRemoveParams` / `MarketplaceRemoveResponse` to the
app-server protocol
- wires the new request through `codex_message_processor`
- reuses the shared core marketplace-remove flow from the stacked
refactor PR
- updates generated schema files and adds focused app-server coverage
## Validation
- `just write-app-server-schema`
- `just fmt`
- heavy compile/test coverage deferred to GitHub CI per request
## Summary
- persist registered agent tasks in the session state update stream so
the thread can reuse them
- prewarm task registration once identity registration succeeds, while
keeping startup failures best-effort
- isolate the session-side task lifecycle into a dedicated module so
AgentIdentityManager and RegisteredAgentTask do not leak across as many
core layers
## Testing
- cargo test -p codex-core startup_agent_task_prewarm
- cargo test -p codex-core
cached_agent_task_for_current_identity_clears_stale_task
- cargo test -p codex-core record_initial_history_
## Summary
- Add the executor-backed RMCP stdio transport.
- Wire MCP stdio placement through the executor environment config.
- Cover local and executor-backed stdio paths with the existing MCP test
helpers.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
@ #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
o #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
Adds max_context_window to model metadata and routes core context-window
reads through resolved model info. Config model_context_window overrides
are clamped to max_context_window when present; without an override, the
model context_window is used.
## Summary
- Populate `PluginDetail.description` in core for uninstalled cross-repo
plugins when detailed fields are unavailable until install.
- Include the source Git URL plus optional path/ref/sha details in that
fallback description.
- Keep `details_unavailable_reason` as the structured signal while
app-server forwards the description normally.
- Add plugin-read coverage proving the response does not clone the
remote source just to show the message.
## Why
Uninstalled cross-repo plugins intentionally return sparse detail data
so listing/reading does not clone the plugin source. Without a
description, Desktop and TUI detail pages look like an ordinary empty
plugin. This gives users a concrete explanation and source pointer while
keeping the existing structured reason available for callers.
## Validation
- `just fmt`
- `cargo test -p codex-core
read_plugin_for_config_uninstalled_git_source_requires_install_without_cloning`
- `cargo test -p codex-app-server plugin_read --test all`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
Note: `cargo test -p codex-app-server` was also attempted before the
latest refactor and failed broadly in unrelated v2
thread/realtime/review/skills suites; the new plugin-read test passed in
that run as well.
Cap the model-visible skills section to a small share of the context
window, with a fallback character budget, and keep only as many implicit
skills as fit within that budget.
Emit a non-fatal warning when enabled skills are omitted, and add a new
app-server warning notification
Record thread-start skill metrics for total enabled skills, kept skills,
and whether truncation happened
---------
Co-authored-by: Matthew Zeng <mzeng@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
- trust-gate project `.codex` layers consistently, including repos that
have `.codex/hooks.json` or `.codex/execpolicy/*.rules` but no
`.codex/config.toml`
- keep disabled project layers in the config stack so nested trusted
project layers still resolve correctly, while preventing hooks and exec
policies from loading until the project is trusted
- update app-server/TUI onboarding copy to make the trust boundary
explicit and add regressions for loader, hooks, exec-policy, and
onboarding coverage
## Security
Before this change, an untrusted repo could auto-load project hooks or
exec policies from `.codex/` as long as `config.toml` was absent. This
makes trust the single gate for project-local config, hooks, and exec
policies.
## Stack
- Parent of #15936
## Test
- cargo test -p codex-core without_config_toml
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Update the plugin API for the new remote plugin model.
The mental model is no longer “keep local plugin state in sync with
remote.” Instead, local and remote plugins are becoming separate
sources. Remote catalog entries can be shown directly from the remote
API before installation; after installation they are still downloaded
into the local cache for execution, but remote installed state will come
from the API and be held in memory rather than being read from config.
• ## API changes
- Remove `forceRemoteSync` from `plugin/list`, `plugin/install`, and
`plugin/uninstall`.
- Remove `remoteSyncError` from `plugin/list`.
- Add remote-capable metadata to `plugin/list` / `plugin/read`:
- nullable `marketplaces[].path`
- `source: { type: "remote", downloadUrl }`
- URL asset fields alongside local path fields:
`composerIconUrl`, `logoUrl`, `screenshotUrls`
- Make `plugin/read` and `plugin/install` source-compatible:
- `marketplacePath?: AbsolutePathBuf | null`
- `remoteMarketplaceName?: string | null`
- exactly one source is required at runtime
## Why
The large Rust test suites are slow and include some of our flakiest
tests, so we want to run them with Bazel native sharding while keeping
shard membership stable between runs.
This is the simpler follow-up to the explicit-label experiment in
#17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which
includes the stable test-name hashing support from
hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel
macros into that support.
Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's
per-shard test action caching. Using stable name hashing avoids
reshuffling every test when one test is added or removed.
## What Changed
`codex_rust_crate` now accepts `test_shard_counts` and applies the right
Bazel/rules_rust attributes to generated unit and integration test
rules. Matched tests are also marked `flaky = True`, giving them Bazel's
default three attempts.
This PR shards these labels 8 ways:
```text
//codex-rs/core:core-all-test
//codex-rs/core:core-unit-tests
//codex-rs/app-server:app-server-all-test
//codex-rs/app-server:app-server-unit-tests
//codex-rs/tui:tui-unit-tests
```
## Verification
`bazel query --output=build` over the selected public labels and their
inner unit-test binaries confirmed the expected `shard_count = 8`,
`flaky = True`, and `experimental_enable_sharding = True` attributes.
Also verified that we see the shards as expected in BuildBuddy so they
can be analyzed independently.
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add first-class marketplace support for git-backed plugin sources
- keep the newer marketplace parsing behavior from `main`, including
alternate manifest locations and string local sources
- materialize remote plugin sources during install, detail reads, and
non-curated cache refresh
- expose git plugin source metadata through the app-server protocol
## Details
This teaches the marketplace parser to accept all of the following:
- local string sources such as `"source": "./plugins/foo"`
- local object sources such as
`{"source":"local","path":"./plugins/foo"}`
- remote repo-root sources such as
`{"source":"url","url":"https://github.com/org/repo.git"}`
- remote subdir sources such as
`{"source":"git-subdir","url":"owner/repo","path":"plugins/foo","ref":"main","sha":"..."}`
It also preserves the newer tolerant behavior from `main`: invalid or
unsupported plugin entries are skipped instead of breaking the whole
marketplace.
## Validation
- `cargo test -p codex-core plugins::marketplace::tests`
- `just fix -p codex-core`
- `just fmt`
## Notes
- A full `cargo test -p codex-core` run still hit unrelated existing
failures in agent and multi-agent tests during this session; the
marketplace-focused suite passed after the rebase resolution.
Follow-up to https://github.com/openai/codex/pull/18178, where we called
out enabling the await-holding lint as a follow-up.
The long-term goal is to enable Clippy coverage for async guards held
across awaits. This PR is intentionally only the first, low-risk cleanup
pass: it narrows obvious lock guard lifetimes and leaves
`codex-rs/Cargo.toml` unchanged so the lint is not enabled until the
remaining cases are fixed or explicitly justified. It intentionally
leaves the active-turn/turn-state locking pattern alone because those
checks and mutations need to stay atomic.
## Common fixes used here
These are the main patterns reviewers should expect in this PR, and they
are also the patterns to reach for when fixing future `await_holding_*`
findings:
- **Scope the guard to the synchronous work.** If the code only needs
data from a locked value, move the lock into a small block, clone or
compute the needed values, and do the later `.await` after the block.
- **Use direct one-line mutations when there is no later await.** Cases
like `map.lock().await.remove(&id)` are acceptable when the guard is
only needed for that single mutation and the statement ends before any
async work.
- **Drain or clone work out of the lock before notifying or awaiting.**
For example, the JS REPL drains pending exec senders into a local vector
and the websocket writer clones buffered envelopes before it serializes
or sends them.
- **Use a `Semaphore` only when serialization is intentional across
async work.** The test serialization guards intentionally span awaited
setup or execution, so using a semaphore communicates "one at a time"
without holding a mutex guard.
- **Remove the mutex when there is only one owner.** The PTY stdin
writer task owns `stdin` directly; the old `Arc<Mutex<_>>` did not
protect shared access because nothing else had access to the writer.
- **Do not split locks that protect an atomic invariant.** This PR
deliberately leaves active-turn/turn-state paths alone because those
checks and mutations need to stay atomic. Those cases should be fixed
separately with a design change or documented with `#[expect]`.
## What changed
- Narrow scoped async mutex guards in app-server, JS REPL, network
approval, remote-control websocket, and the RMCP test server.
- Replace test-only async mutex serialization guards with semaphores
where the guard intentionally lives across async work.
- Let the PTY pipe writer task own stdin directly instead of wrapping it
in an async mutex.
## Verification
- `just fix -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness`
- `just clippy -p codex-core`
- `cargo test -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` was
run; the app-server suite passed, and `codex-core` failed in the local
sandbox on six otel approval tests plus
`suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
which appear to depend on local command approval/default rules and
`CODEX_SANDBOX_NETWORK_DISABLED=1` in this environment.
## Summary
First PR in the split from #17956.
- adds the core/app-server `RateLimitReachedType` shape
- maps backend `rate_limit_reached_type` into Codex rate-limit snapshots
- carries the field through app-server notifications/responses and
generated schemas
- updates existing constructors/tests for the new optional field
## Validation
- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
## Summary
- Normalize deferred MCP and dynamic tools into `ToolSearchEntry` values
before constructing `ToolSearchHandler`.
- Move the tool-search entry adapter out of `tools/handlers` and into
`tools/tool_search_entry.rs` so the handlers directory stays focused on
handlers.
- Keep `ToolSearchHandler` operating over one generic entry list for
BM25 search, namespace grouping, and per-bucket default limits.
## Why
Follow-up cleanup for #17849. The dynamic tool-search support made the
handler juggle source-specific MCP and dynamic tool lists, index
arithmetic, output conversion, and namespace emission. This keeps source
adaptation outside the handler so the search loop itself is smaller and
source-agnostic.
## Validation
- `just fmt`
- `cargo test -p codex-core tools::handlers::tool_search::tests`
- `git diff --check`
- `cargo test -p codex-core` currently fails in unrelated
`plugins::manager::tests::list_marketplaces_ignores_installed_roots_missing_from_config`;
rerunning that single test fails the same way at
`core/src/plugins/manager_tests.rs:1692`.
---------
Co-authored-by: pash <pash@openai.com>
Summary
- replace the thread/read persisted-load helper with
ThreadStore::read_thread
- move SQLite/rollout summary, name, fork metadata, and history loading
for persisted reads into LocalThreadStore
- leave getConversationSummary unchanged for a later PR
Context
- Replaces closed stacked PR #18232 after PR #18231 merged and its base
branch was deleted.
## Summary
- adds managed requirements support for deny-read filesystem entries
- constrains config layers so managed deny-read requirements cannot be
widened by user-controlled config
- surfaces managed deny-read requirements through debug/config plumbing
This PR lets managed requirements inject deny-read filesystem
constraints into the effective filesystem sandbox policy.
User-controlled config can still choose the surrounding permission
profile, but it cannot remove or weaken the managed deny-read entries.
## Managed deny-read shape
A managed requirements file can declare exact paths and glob patterns
under `[permissions.filesystem]`:
```toml
# /etc/codex/requirements.toml
[permissions.filesystem]
deny_read = [
"/Users/alice/.gitconfig",
"/Users/alice/.ssh",
"./managed-private/**/*.env",
]
```
Those entries are compiled into the effective filesystem policy as
`access = none` rules, equivalent in shape to filesystem permission
entries like:
```toml
[permissions.workspace.filesystem]
"/Users/alice/.gitconfig" = "none"
"/Users/alice/.ssh" = "none"
"/absolute/path/to/managed-private/**/*.env" = "none"
```
The important difference is that the managed entries come from
requirements, so lower-precedence user config cannot remove them or make
those paths readable again.
Relative managed `deny_read` entries are resolved relative to the
directory containing the managed requirements file. Glob entries keep
their glob suffix after the non-glob prefix is normalized.
## Runtime behavior
- Managed `deny_read` entries are appended to the effective
`FileSystemSandboxPolicy` after the selected permission profile is
resolved.
- Exact paths become `FileSystemPath::Path { access: None }`; glob
patterns become `FileSystemPath::GlobPattern { access: None }`.
- When managed deny-read entries are present, `sandbox_mode` is
constrained to `read-only` or `workspace-write`; `danger-full-access`
and `external-sandbox` cannot silently bypass the managed read-deny
policy.
- On Windows, the managed deny-read policy is enforced for direct file
tools, but shell subprocess reads are not sandboxed yet, so startup
emits a warning for that platform.
- `/debug-config` shows the effective managed requirement as
`permissions.filesystem.deny_read` with its source.
## Stack
1. #15979 - glob deny-read policy/config/direct-tool support
2. #18096 - macOS and Linux sandbox enforcement
3. This PR - managed deny-read requirements
---------
Co-authored-by: Codex <noreply@openai.com>
… import
## Why
`externalAgentConfig/import` used to spawn plugin imports in the
background and return immediately. That meant local marketplace imports
could still be in flight when the caller refreshed plugin state, so
newly imported plugins would not show up right away.
This change makes local marketplace imports complete before the RPC
returns, while keeping remote marketplace imports asynchronous so we do
not block on remote fetches.
## What changed
- split plugin migration details into local and remote marketplace
imports based on the external config source
- import local marketplaces synchronously during
`externalAgentConfig/import`
- return pending remote plugin imports to the app-server so it can
finish them in the background
- clear the plugin and skills caches before responding to plugin
imports, and again after background remote imports complete, so the next
`plugin/list` reloads fresh state
- keep marketplace source parsing encapsulated behind
`is_local_marketplace_source(...)` instead of re-exporting the internal
enum
- add core and app-server coverage for the synchronous local import path
and the pending remote import path
## Verification
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core` (currently fails an existing unrelated
test:
`config_loader::tests::cli_override_can_update_project_local_mcp_server_when_project_is_trusted`)
- `cargo test` (currently fails existing `codex-app-server` integration
tests in MCP/skills/thread-start areas, plus the unrelated `codex-core`
failure above)
## Summary
This changes Codex logout so managed ChatGPT auth is revoked against
AuthAPI before local auth state is removed. CLI logout, TUI `/logout`,
and the app-server account logout path now use the token-revoking logout
flow instead of only deleting `auth.json` / credential store state.
## Root Cause
Logout previously cleared only local auth storage. That removed Codex's
local credentials but did not ask the backend to invalidate the
refresh/access token state associated with a managed ChatGPT login.
## Behavior
For managed ChatGPT auth, logout sends the stored refresh token to
`https://auth.openai.com/oauth/revoke` with `token_type_hint:
refresh_token` and the Codex OAuth client id, then deletes all local
auth stores after revocation succeeds. If only an access token is
available, it falls back to revoking that access token. API key auth and
externally supplied `chatgptAuthTokens` are still only cleared locally
because Codex does not own a refresh token for those modes.
Revocation failures are fail-closed: if Codex cannot load stored auth or
the backend revoke call fails, logout returns an error and leaves local
auth in place so the user can retry instead of silently clearing local
state while backend tokens remain valid.
## Validation
ran local version of `codex-cli` with staging overrides/harness for auth
ran `codex login` then `codex logout`:
saw auth.json clear and backend revocation endpoints were called
```
POST /oauth/revoke
status: 200
revoking access token
should clear auth session
clearing auth session due to token revocation
successfully revoked session and access token
CANONICAL-API-LINE Response: status='200' method='POST' path='/oauth/revoke
```
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- refactor thread/read into explicit persisted-load, live-load, and
merge steps
- preserve existing SQLite/filesystem/live-thread behavior exactly
- keep ThreadStore migration out of this PR so the next PR is easier to
review
Validation
- this one's a pure reorganization that relies on existing test coverage
Load plugin manifests through a shared discoverable-path helper so
manifest reads, installs, and skill names all see the same alternate
manifest location.
## Summary
- Switch the unknown-thread MCP resource read test from the stdio
subprocess to the in-process app-server path.
- Keep the assertion focused on the returned error message while
avoiding child-process teardown timing issues in nextest.
## Testing
- Not run (not requested)
## Summary
- set the realtime v2 server VAD silence delay to 500ms
- update the default realtime 1.5 backend prompt to the v4 text
- keep the session payload and prompt rendering tests aligned with those
changes
## Why
- the VAD change gives the voice path a longer pause before ending the
user's turn
- the prompt change makes the default bundled realtime prompt match the
current v4 content
## Validation
- `cargo +1.93.0 test -p codex-core realtime_prompt --manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
--manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-app-server --test all
'suite::v2::realtime_conversation::realtime_webrtc_start_emits_sdp_notification'
--manifest-path /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml
-- --exact`
## Summary
Stack PR3 for feature-gated agent identity support.
This PR adds per-thread agent task registration behind
`features.use_agent_identity`. Tasks are minted on the first real user
turn and cached in thread runtime state for later turns.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - this PR, original
task registration slice
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use `AgentAssertion`
downstream when enabled
## Validation
Covered as part of the local stack validation pass:
- `just fmt`
- `cargo test -p codex-core --lib agent_identity`
- `cargo test -p codex-core --lib agent_assertion`
- `cargo test -p codex-core --lib websocket_agent_task`
- `cargo test -p codex-api api_bridge`
- `cargo build -p codex-cli --bin codex`
## Notes
The full local app-server E2E path is still being debugged after PR
creation. The current branch stack is directionally ready for review
while that follow-up continues.
## Problem
When a user resumed or forked a session, the TUI could render the
restored thread history immediately, but it did not receive token usage
until a later model turn emitted a fresh usage event. That left the
context/status UI blank or stale during the exact window where the user
expects resumed state to look complete. Core already reconstructed token
usage from the rollout; the missing behavior was app-server lifecycle
replay to the client that just attached.
## Mental model
Token usage has two representations. The rollout is the durable source
of historical `TokenCount` events, and the core session cache is the
in-memory snapshot reconstructed from that rollout on resume or fork.
App-server v2 clients do not read core state directly; they learn about
usage through `thread/tokenUsage/updated`. The fix keeps those roles
separate: core exposes the restored `TokenUsageInfo`, and app-server
sends one targeted notification after a successful `thread/resume` or
`thread/fork` response when that restored snapshot exists.
This notification is not a new model event. It is a replay of
already-persisted state for the client that just attached. That
distinction matters because using the normal core event path here would
risk duplicating `TokenCount` entries in the rollout and making future
resumes count historical usage twice.
## Non-goals
This change does not add a new protocol method or payload shape. It
reuses the existing v2 `thread/tokenUsage/updated` notification and the
TUI’s existing handler for that notification.
This change does not alter how token usage is computed, accumulated,
compacted, or written during turns. It only exposes the token usage that
resume and fork reconstruction already restored.
This change does not broadcast historical usage replay to every
subscribed client. The replay is intentionally scoped to the connection
that requested resume or fork so already-attached clients are not
surprised by an old usage update while they may be rendering live
activity.
## Tradeoffs
Sending the usage notification after the JSON-RPC response preserves a
clear lifecycle order: the client first receives the thread object, then
receives restored usage for that thread. The tradeoff is that usage is
still a notification rather than part of the `thread/resume` or
`thread/fork` response. That keeps the protocol shape stable and avoids
duplicating usage fields across response types, but clients must
continue listening for notifications after receiving the response.
The helper selects the latest non-in-progress turn id for the replayed
usage notification. This is conservative because restored usage belongs
to completed persisted accounting, not to newly attached in-flight work.
The fallback to the last turn preserves a stable wire payload for
unusual histories, but histories with no meaningful completed turn still
have a weak attribution story.
## Architecture
Core already seeds `Session` token state from the last persisted rollout
`TokenCount` during `InitialHistory::Resumed` and
`InitialHistory::Forked`. The new core accessor exposes the complete
`TokenUsageInfo` through `CodexThread` without giving app-server direct
session mutation authority.
App-server calls that accessor from three lifecycle paths: cold
`thread/resume`, running-thread resume/rejoin, and `thread/fork`. In
each path, the server sends the normal response first, then calls a
shared helper that converts core usage into
`ThreadTokenUsageUpdatedNotification` and sends it only to the
requesting connection.
The tests build fake rollouts with a user turn plus a persisted token
usage event. They then exercise `thread/resume` and `thread/fork`
without starting another model turn, proving that restored usage arrives
before any next-turn token event could be produced.
## Observability
The primary debug path is the app-server JSON-RPC stream. After
`thread/resume` or `thread/fork`, a client should see the response
followed by `thread/tokenUsage/updated` when the source rollout includes
token usage. If the notification is absent, check whether the rollout
contains an `event_msg` payload of type `token_count`, whether core
reconstruction seeded `Session::token_usage_info`, and whether the
connection stayed attached long enough to receive the targeted
notification.
The notification is sent through the existing
`OutgoingMessageSender::send_server_notification_to_connections` path,
so existing app-server tracing around server notifications still
applies. Because this is a replay, not a model turn event, debugging
should start at the resume/fork handlers rather than the turn event
translation in `bespoke_event_handling`.
## Tests
The focused regression coverage is `cargo test -p codex-app-server
emits_restored_token_usage`, which covers both resume and fork. The core
reconstruction guard is `cargo test -p codex-core
record_initial_history_seeds_token_info_from_rollout`.
Formatting and lint/fix passes were run with `just fmt`, `just fix -p
codex-core`, and `just fix -p codex-app-server`. Full crate test runs
surfaced pre-existing unrelated failures in command execution and plugin
marketplace tests; the new token usage tests passed in focused runs and
within the app-server suite before the unrelated command execution
failure.
## Summary
- replace the unsubscribe-during-turn test's sleep/polling flow with a
gated streaming SSE response
- add request-count notification support to the streaming SSE test
server so the test can wait for the in-flight Responses request
deterministically
## Scope
- codex-rs/app-server/tests/suite/v2/thread_unsubscribe.rs
- codex-rs/core/tests/common/streaming_sse.rs
## Validation
- Not run locally; this is a narrow extraction from the prior CI-green
branch.
---------
Co-authored-by: Codex <noreply@openai.com>