Add a temporary internal remote_plugin feature flag that merges remote
marketplaces into plugin/list and routes plugin/read through the remote
APIs when needed, while keeping pure local marketplaces working as
before.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add experimental turn/start.environments params for per-turn
environment id + cwd selections
- pass selections through core protocol ops and resolve them with
EnvironmentManager before TurnContext creation
- treat omitted selections as default behavior, empty selections as no
environment, and non-empty selections as first environment/cwd as the
turn primary
## Testing
- ran `just fmt`
- ran `just write-app-server-schema`
- not run: unit tests for this stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Adds a process-local, in-memory cookie store for ChatGPT HTTP clients.
- Limits cookie storage and replay to a shared ChatGPT host allowlist.
- Wires the shared store into the default Codex reqwest client and
backend client.
- Shares the ChatGPT host allowlist with remote-control URL validation
to avoid drift.
- Enables reqwest cookie support and updates lockfiles.
## Summary
This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes
It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.
This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.
## Stack
1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider
## Testing
Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
## Why
The device-key protocol needs an app-server implementation that keeps
local key operations behind the same request-processing boundary as
other v2 APIs.
app-server owns request dispatch, transport policy, documentation, and
JSON-RPC error shaping. `codex-device-key` owns key binding, validation,
platform provider selection, and signing mechanics. Keeping the adapter
thin makes the boundary easier to review and avoids moving local
key-management details into thread orchestration code.
## What changed
- Added `DeviceKeyApi` as the app-server adapter around
`DeviceKeyStore`.
- Converted protocol protection policies, payload variants, algorithms,
and protection classes to and from the device-key crate types.
- Encoded SPKI public keys and DER signatures as base64 protocol fields.
- Routed `device/key/create`, `device/key/public`, and `device/key/sign`
through `MessageProcessor`.
- Rejected remote transports before provider access while allowing local
`stdio` and in-process callers to reach the device-key API.
- Added stdio, in-process, and websocket tests for device-key validation
and transport policy.
- Documented the device-key methods in the app-server v2 method list.
## Test coverage
- `device_key_create_rejects_empty_account_user_id`
- `in_process_allows_device_key_requests_to_reach_device_key_api`
- `device_key_methods_are_rejected_over_websocket`
## Stack
This is PR 3 of 4 in the device-key app-server stack. It is stacked on
#18429.
## Validation
- `cargo test -p codex-app-server device_key`
- `just fix -p codex-app-server`
## Why
PR #18431 exposed a Bazel clippy failure in the app-server unit-test
target across Linux, macOS, and Windows. The failing lint was
`clippy::await_holding_invalid_type`: two tracing tests serialized
access to global tracing state by holding a `tokio::sync::MutexGuard`
across awaited test work.
That serialization is still needed because the tests share
process-global tracing setup and exporter state, but it should not
require holding an async mutex guard through the whole test body.
## What changed
- Replaced the bespoke async `tracing_test_guard` helper with
`serial_test` on the two tracing tests that need global tracing
serialization.
- Removed the `#[expect(clippy::await_holding_invalid_type)]`
annotations and the lock guard callsites that Bazel clippy rejected.
## Validation
- `cargo test -p codex-app-server jsonrpc_span`
- `just fix -p codex-app-server`
- `git diff --check`
I also attempted the exact failing Bazel clippy target locally with
BuildBuddy disabled: `bazel --noexperimental_remote_repo_contents_cache
build --config=clippy --bes_backend= --remote_cache=
--experimental_remote_downloader= --
//codex-rs/app-server:app-server-unit-tests-bin`. That run did not reach
clippy because Bazel timed out downloading `libcap-2.27.tar.gz` from
`kernel.org`.
## Why
Permission approval responses must not be able to grant more access than
the tool requested. Moving this flow to `PermissionProfile` means the
comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
cwd-relative special paths such as `:cwd` and `:project_roots` must stay
anchored to the turn that produced the request.
## What changed
This implements semantic `PermissionProfile` intersection in
`codex-sandboxing` for file-system and network permissions. The
intersection accepts narrower path grants, rejects broader grants,
preserves deny-read carve-outs and glob scan depth, and materializes
cwd-dependent special-path grants to absolute paths before they can be
recorded for reuse.
The request-permissions response paths now use that intersection
consistently. App-server captures the request turn cwd before waiting
for the client response, includes that cwd in the v2 approval params,
and core stores the requested profile plus cwd for direct TUI/client
responses and Guardian decisions before recording turn- or
session-scoped grants. The TUI app-server bridge now preserves the
app-server request cwd when converting permission approval params into
core events.
## Verification
- `cargo test -p codex-sandboxing intersect_permission_profiles --
--nocapture`
- `cargo test -p codex-app-server request_permissions_response --
--nocapture`
- `cargo test -p codex-core
request_permissions_response_materializes_session_cwd_grants_before_recording
-- --nocapture`
- `cargo check -p codex-tui --tests`
- `cargo check --tests`
- `cargo test -p codex-tui
app_server_request_permissions_preserves_file_system_permissions`
## Summary
- attach the authoritative Codex thread id to MCP tool request
`_meta.threadId` for model-initiated tool calls
- attach the same thread id for manual `mcpServer/tool/call` requests
before invoking the MCP server
- cover both metadata helper behavior and the manual app-server MCP path
in tests
needed because the Rust app-server is the last place that still has
authoritative knowledge of “this model-generated MCP tool call belongs
to conversation/thread X” before the request leaves Codex and reaches
Hoopa. It adds threadId to MCP request metadata in the model-generated
tool-call path, using sess.conversation_id, and also does the same for
the manual mcpServer/tool/call path.
## Test plan
- `cargo test -p codex-core
mcp_tool_call_thread_id_meta_is_added_to_request_meta --lib`
- `cargo test -p codex-app-server
mcp_server_tool_call_returns_tool_result`
Paired Hoopa consumer PR: https://github.com/openai/openai/pull/833263
## Why
Clients need a stable app-server protocol surface for enrolling a local
device key, retrieving its public key, and producing a device-bound
proof.
The protocol reports `protectionClass` explicitly so clients can
distinguish hardware-backed keys from an explicitly allowed OS-protected
fallback. Signing uses a tagged `DeviceKeySignPayload` enum rather than
arbitrary bytes so each signed statement is auditable at the API
boundary.
## What changed
- Added v2 JSON-RPC methods for `device/key/create`,
`device/key/public`, and `device/key/sign`.
- Added request/response types for device-key metadata, SPKI public
keys, protection classes, and ECDSA signatures.
- Added `DeviceKeyProtectionPolicy` with hardware-only default behavior
and an explicit `allow_os_protected_nonextractable` option.
- Added the initial `remoteControlClientConnection` signing payload
variant.
- Regenerated JSON Schema and TypeScript fixtures for app-server
clients.
## Stack
This is PR 1 of 4 in the device-key app-server stack.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
## Summary
- Move external agent config migration logic and tests from `codex-core`
into `app-server/src/config`.
- Keep the migration service crate-private to app-server and update the
API adapter imports.
- Remove stale core re-exports and expose only the needed marketplace
source helper.
## Testing
- `cargo test -p codex-app-server config::external_agent_config`
- `just fmt`
- `just fix -p codex-app-server`
- `just fix -p codex-core`
- `git diff --check`
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.
This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.
It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.
Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
## Why
This PR prepares the stack to enable Clippy await-holding lints that
were left disabled in #18178. The mechanical lock-scope cleanup is
handled separately; this PR is the documentation/configuration layer for
the remaining await-across-guard sites.
Without explicit annotations, reviewers and future maintainers cannot
tell whether an await-holding warning is a real concurrency smell or an
intentional serialization boundary.
## What changed
- Configures `clippy.toml` so `await_holding_invalid_type` also covers
`tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
- Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
...)]` annotations for intentional async guard lifetimes.
- Documents the main categories of intentional cases: active-turn state
transitions that must remain atomic, session-owned MCP manager accesses,
remote-control websocket serialization, JS REPL kernel/process
serialization, OAuth persistence, external bearer token refresh
serialization, and tests that intentionally serialize shared global or
session-owned state.
- For external bearer token refresh, documents the existing
serialization boundary: holding `cached_token` across the provider
command prevents concurrent cache misses from starting duplicate refresh
commands, and the current behavior is small enough that an explicit
expectation is easier to maintain than adding another synchronization
primitive.
## Verification
- `cargo clippy -p codex-login --all-targets`
- `cargo clippy -p codex-connectors --all-targets`
- `cargo clippy -p codex-core --all-targets`
- The follow-up PR #18698 enables `await_holding_invalid_type` and
`await_holding_lock` as workspace `deny` lints, so any undocumented
remaining offender will fail Clippy.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
* #18698
* __->__ #18423
## Why
Customers need finer-grained control over allowed sandbox modes based on
the host Codex is running on. For example, they may want stricter
sandbox limits on devboxes while keeping a different default elsewhere.
Our current cloud requirements can target user/account groups, but they
cannot vary sandbox requirements by host. That makes remote development
environments awkward because the same top-level `allowed_sandbox_modes`
has to apply everywhere.
## What
Adds a new `remote_sandbox_config` section to `requirements.toml`:
```toml
allowed_sandbox_modes = ["read-only"]
[[remote_sandbox_config]]
hostname_patterns = ["*.org"]
allowed_sandbox_modes = ["read-only", "workspace-write"]
[[remote_sandbox_config]]
hostname_patterns = ["*.sh", "runner-*.ci"]
allowed_sandbox_modes = ["read-only", "danger-full-access"]
```
During requirements resolution, Codex resolves the local host name once,
preferring the machine FQDN when available and falling back to the
cleaned kernel hostname. This host classification is best effort rather
than authenticated device proof.
Each requirements source applies its first matching
`remote_sandbox_config` entry before it is merged with other sources.
The shared merge helper keeps that `apply_remote_sandbox_config` step
paired with requirements merging so new requirements sources do not have
to remember the extra call.
That preserves source precedence: a lower-precedence requirements file
with a matching `remote_sandbox_config` cannot override a
higher-precedence source that already set `allowed_sandbox_modes`.
This also wires the hostname-aware resolution through app-server,
CLI/TUI config loading, config API reads, and config layer metadata so
they all evaluate remote sandbox requirements consistently.
## Verification
- `cargo test -p codex-config remote_sandbox_config`
- `cargo test -p codex-config host_name`
- `cargo test -p codex-core
load_config_layers_applies_matching_remote_sandbox_config`
- `cargo test -p codex-core
system_remote_sandbox_config_keeps_cloud_sandbox_modes`
- `cargo test -p codex-config`
- `cargo test -p codex-core` unit tests passed; `tests/all.rs`
integration matrix was intentionally stopped after the relevant focused
tests passed
- `just fix -p codex-config`
- `just fix -p codex-core`
- `cargo check -p codex-app-server`
## Summary
Making thread id optional so that we can better cache resources for MCPs
for connectors since their resource templates is universal and not
particular to projects.
- Make `mcpServer/resource/read` accept an optional `threadId`
- Read resources from the current MCP config when no thread is supplied
- Keep the existing thread-scoped path when `threadId` is present
- Update the generated schemas, README, and integration coverage
## Testing
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-mcp`
- `cargo test -p codex-app-server --test all mcp_resource`
- `just fix -p codex-mcp`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
## Why
#18274 made `PermissionProfile` the canonical file-system permissions
shape, but the round-trip from `FileSystemSandboxPolicy` to
`PermissionProfile` still dropped one piece of policy metadata:
`glob_scan_max_depth`.
That field is security-relevant for deny-read globs such as `**/*.env`.
On Linux, bubblewrap sandbox construction uses it to bound unreadable
glob expansion. If a profile copied from active runtime permissions
loses this value and is submitted back as an override, the resulting
`FileSystemSandboxPolicy` can behave differently even though the visible
permission entries look equivalent.
## What changed
- Add `glob_scan_max_depth` to protocol `FileSystemPermissions` and
preserve it when converting to/from `FileSystemSandboxPolicy`.
- Keep legacy `read`/`write` JSON for simple path-only permissions, but
force canonical JSON when glob scan depth is present so the metadata is
not silently dropped.
- Carry `globScanMaxDepth` through app-server
`AdditionalFileSystemPermissions`, generated JSON/TypeScript schemas,
and app-server/TUI conversion call sites.
- Preserve the metadata through sandboxing permission normalization,
merging, and intersection.
- Carry the merged scan depth into the effective
`FileSystemSandboxPolicy` used for command execution, so bounded
deny-read globs reach Linux bubblewrap materialization.
## Verification
- `cargo test -p codex-sandboxing glob_scan -- --nocapture`
- `cargo test -p codex-sandboxing policy_transforms -- --nocapture`
- `just fix -p codex-sandboxing`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18713).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* #18275
* __->__ #18713
## Why
Codex needs a first-class `amazon-bedrock` model provider so users can
select Bedrock without copying a full provider definition into
`config.toml`. The provider has Codex-owned defaults for the pieces that
should stay consistent across users: the display `name`, Bedrock
`base_url`, and `wire_api`.
At the same time, users still need a way to choose the AWS credential
profile used by their local environment. This change makes
`amazon-bedrock` a partially modifiable built-in provider: code owns the
provider identity and endpoint defaults, while user config can set
`model_providers.amazon-bedrock.aws.profile`.
For example:
```toml
model_provider = "amazon-bedrock"
[model_providers.amazon-bedrock.aws]
profile = "codex-bedrock"
```
## What Changed
- Added `amazon-bedrock` to the built-in model provider map with:
- `name = "Amazon Bedrock"`
- `base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"`
- `wire_api = "responses"`
- Added AWS provider auth config with a profile-only shape:
`model_providers.<id>.aws.profile`.
- Kept AWS auth config restricted to `amazon-bedrock`; custom providers
that set `aws` are rejected.
- Allowed `model_providers.amazon-bedrock` through reserved-provider
validation so it can act as a partial override.
- During config loading, only `aws.profile` is copied from the
user-provided `amazon-bedrock` entry onto the built-in provider. Other
Bedrock provider fields remain hard-coded by the built-in definition.
- Updated the generated config schema for the new provider AWS profile
config.
## Why
Cloud-hosted sessions need a way for the service that starts or manages
a thread to provide session-owned config without treating all config as
if it came from the same user/project/workspace TOML stack.
The important boundary is ownership: some values should be controlled by
the session/orchestrator, some by the authenticated user, and later some
may come from the executor. The earlier broad config-store shape made
that boundary too fuzzy and overlapped heavily with the existing
filesystem-backed config loader. This PR starts with the smaller piece
we need now: a typed session config loader that can feed the existing
config layer stack while preserving the normal precedence and merge
behavior.
## What Changed
- Added `ThreadConfigLoader` and related typed payloads in
`codex-config`.
- `SessionThreadConfig` currently supports `model_provider`,
`model_providers`, and feature flags.
- `UserThreadConfig` is present as an ownership boundary, but does not
yet add TOML-backed fields.
- `NoopThreadConfigLoader` preserves existing behavior when no external
loader is configured.
- `StaticThreadConfigLoader` supports tests and simple callers.
- Taught thread config sources to produce ordinary `ConfigLayerEntry`
values so the existing `ConfigLayerStack` remains the place where
precedence and merging happen.
- Wired the loader through `ConfigBuilder`, the config loader, and
app-server startup paths so app-server can provide session-owned config
before deriving a thread config.
- Added coverage for:
- translating typed thread config into config layers,
- inserting thread config layers into the stack at the right precedence,
- applying session-provided model provider and feature settings when
app-server derives config from thread params.
## Follow-Ups
This intentionally stops short of adding the remote/service transport.
The next pieces are expected to be:
1. Define the proto/API shape for this interface.
2. Add a client implementation that can source session config from the
service side.
## Verification
- Added unit coverage in `codex-config` for the loader and layer
conversion.
- Added `codex-core` config loader coverage for thread config layer
precedence.
- Added app-server coverage that verifies session thread config wins
over request-provided config for model provider and feature settings.
## Summary
Adds a second realtime v2 function tool, `remain_silent`, so the
realtime model has an explicit non-speaking action when the
collaboration mode or latest context says it should not answer aloud.
This is stacked on #18597.
## Design
- Advertise `remain_silent` alongside `background_agent` in realtime v2
conversational sessions.
- Parse `remain_silent` function calls into a typed
`RealtimeEvent::NoopRequested` event.
- Have core answer that function call with an empty
`function_call_output` and deliberately avoid `response.create`, so no
follow-up realtime response is requested.
- Keep the event hidden from app-server/TUI surfaces; it is operational
plumbing, not user-visible conversation content.
Migrate the conversation summary App Server methods to ThreadStore
Because this app server api allows explicitly fetching the thread by
rollout path, intercept that case in the app server code and (a) route
directly to underlying local thread store methods if we're using a local
thread store, or (b) throw an unsupported error if we're using a remote
thread store. This keeps the thread store API clean and all filesystem
operations inside of the local thread store, which pushing the
"fundamental incompatibility" check as early as possible.
## Summary
This PR aims to improve integration between the realtime model and the
codex agent by sharing more context with each other. In particular, we
now share full realtime conversation transcript deltas in addition to
the delegation message.
realtime_conversation.rs now turns a handoff into:
```
<realtime_delegation>
<input>...</input>
<transcript_delta>...</transcript_delta>
</realtime_delegation>
```
## Implementation notes
The transcript is accumulated in the realtime websocket layer as parsed
realtime events arrive. When a background-agent handoff is requested,
the current transcript snapshot is copied onto the handoff event and
then serialized by `realtime_conversation.rs` into the hidden realtime
delegation envelope that Codex receives as user-turn context.
For Realtime V2, the session now explicitly enables input audio
transcription, and the parser handles the relevant input/output
transcript completion events so the snapshot includes both user speech
and realtime model responses. The delegation `<input>` remains the
actual handoff request, while `<transcript_delta>` carries the
surrounding conversation history for context.
Reviewers should note that the transcript payload is intended for Codex
context sharing, not UI rendering. The realtime delegation envelope
should stay hidden from the user-facing transcript surface, while still
being included in the background-agent turn so Codex can answer with the
same conversational context the realtime model had.
Wires patch_updated events through app_server. These events are parsed
and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.
The eventual goal is to use this to display better progress indicators in
the codex app.
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
## Why
`PermissionProfile` needs stable, canonical file-system semantics before
it can become the primary runtime permissions abstraction. Without a
canonical form, callers have to keep re-deriving legacy sandbox maps and
profile comparisons remain lossy or order-dependent.
## What changed
This adds canonicalization helpers for `FileSystemPermissions` and
`PermissionProfile`, expands special paths into explicit sandbox
entries, and updates permission request/conversion paths to consume
those canonical entries. It also tightens the legacy bridge so root-wide
write profiles with narrower carveouts are not silently projected as
full-disk legacy access.
## Verification
- `cargo test -p codex-protocol
root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
- `cargo test -p codex-sandboxing permission -- --nocapture`
- `cargo test -p codex-tui permissions -- --nocapture`
- Migrates unloaded `thread/name/set` and `thread/memoryModeSet`
app-server writes behind the generic
`ThreadStore::update_thread_metadata` API rather than adding one-off
store methods for setting thread name or memory mode.
- Implements the local ThreadStore metadata patch path for thread name
and memory mode, including rollout append, legacy name index updates,
SessionMeta validation/update, SQLite reconciliation, and re-reading the
stored thread.
- Adds focused local thread-store unit coverage plus app-server
integration coverage for the migrated unloaded write paths.
## Summary
Introduces a single background/control-plane agent task for ChatGPT
backend requests that do not have a thread-scoped task, with
`AuthManager` owning the default ChatGPT backend authorization decision.
Callers now ask `AuthManager` for the default ChatGPT backend
authorization header. `AuthManager` decides whether that is bearer or
background AgentAssertion based on config/internal state, while
low-level bootstrap paths can explicitly request bearer-only auth.
This PR is stacked on PR4 and focuses on the shared background task auth
plumbing plus the first tranche of backend/control-plane consumers. The
remaining callsite wiring is split into PR4.2 to keep review size down.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use task-scoped
`AgentAssertion` for downstream calls
- PR4.1: this PR - introduce AuthManager-owned background/control-plane
`AgentAssertion` auth
- PR4.2: https://github.com/openai/codex/pull/18260 - use background
task auth for additional backend/control-plane calls
## What Changed
- add background task registration and assertion minting inside
`codex-login`
- persist `agent_identity.background_task_id` separately from
per-session task state
- make `BackgroundAgentTaskManager` private to `codex-login`; call sites
do not instantiate or pass it around
- teach `AuthManager` the ChatGPT backend base URL and feature-derived
background auth mode from resolved config
- expose bearer-only helpers for bootstrap/registration/refresh-style
paths that must not use AgentAssertion
- wire `AuthManager` default ChatGPT authorization through app listing,
connector directory listing, remote plugins, MCP status/listing,
analytics, and core-skills remote calls
- preserve bearer fallback when the feature is disabled, the backend
host is unsupported, or background task registration is not available
## Validation
- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-login agent_identity`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
## Why
This addresses the review comment from #17751 about `marketplace/remove`
app-server test portability:
https://github.com/openai/codex/pull/17751#discussion_r3104378613
The API returns the removed installed root using the app-server's
effective `CODEX_HOME`. On macOS, temporary directory paths can appear
as either `/var/...` or `/private/var/...`, so comparing one raw path
against another can fail even when `marketplace/remove` behaves
correctly.
## What changed
- Removed the direct whole-response equality assertion for the installed
root path.
- Asserted the stable response field, `marketplace_name`, directly.
- Compared the expected and returned installed-root paths after
canonicalizing their existing parent directories, which avoids requiring
the removed leaf directory to still exist.
## Verification
- `cargo test -p codex-app-server
marketplace_remove_deletes_config_and_installed_root`
- `cargo test -p codex-app-server marketplace_remove`
## Summary
Add a new app-server `marketplace/remove` RPC on top of the shared
marketplace-remove implementation.
This change:
- adds `MarketplaceRemoveParams` / `MarketplaceRemoveResponse` to the
app-server protocol
- wires the new request through `codex_message_processor`
- reuses the shared core marketplace-remove flow from the stacked
refactor PR
- updates generated schema files and adds focused app-server coverage
## Validation
- `just write-app-server-schema`
- `just fmt`
- heavy compile/test coverage deferred to GitHub CI per request
## Summary
- persist registered agent tasks in the session state update stream so
the thread can reuse them
- prewarm task registration once identity registration succeeds, while
keeping startup failures best-effort
- isolate the session-side task lifecycle into a dedicated module so
AgentIdentityManager and RegisteredAgentTask do not leak across as many
core layers
## Testing
- cargo test -p codex-core startup_agent_task_prewarm
- cargo test -p codex-core
cached_agent_task_for_current_identity_clears_stale_task
- cargo test -p codex-core record_initial_history_
## Summary
- Add the executor-backed RMCP stdio transport.
- Wire MCP stdio placement through the executor environment config.
- Cover local and executor-backed stdio paths with the existing MCP test
helpers.
## Stack
```text
o #18027 [6/6] Fail exec client operations after disconnect
│
@ #18212 [5/6] Wire executor-backed MCP stdio
│
o #18087 [4/6] Abstract MCP stdio server launching
│
o #18020 [3/6] Add pushed exec process events
│
o #18086 [2/6] Support piped stdin in exec process API
│
o #18085 [1/6] Add MCP server environment config
│
o main
```
---------
Co-authored-by: Codex <noreply@openai.com>
Adds max_context_window to model metadata and routes core context-window
reads through resolved model info. Config model_context_window overrides
are clamped to max_context_window when present; without an override, the
model context_window is used.
## Summary
- Populate `PluginDetail.description` in core for uninstalled cross-repo
plugins when detailed fields are unavailable until install.
- Include the source Git URL plus optional path/ref/sha details in that
fallback description.
- Keep `details_unavailable_reason` as the structured signal while
app-server forwards the description normally.
- Add plugin-read coverage proving the response does not clone the
remote source just to show the message.
## Why
Uninstalled cross-repo plugins intentionally return sparse detail data
so listing/reading does not clone the plugin source. Without a
description, Desktop and TUI detail pages look like an ordinary empty
plugin. This gives users a concrete explanation and source pointer while
keeping the existing structured reason available for callers.
## Validation
- `just fmt`
- `cargo test -p codex-core
read_plugin_for_config_uninstalled_git_source_requires_install_without_cloning`
- `cargo test -p codex-app-server plugin_read --test all`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
Note: `cargo test -p codex-app-server` was also attempted before the
latest refactor and failed broadly in unrelated v2
thread/realtime/review/skills suites; the new plugin-read test passed in
that run as well.
Cap the model-visible skills section to a small share of the context
window, with a fallback character budget, and keep only as many implicit
skills as fit within that budget.
Emit a non-fatal warning when enabled skills are omitted, and add a new
app-server warning notification
Record thread-start skill metrics for total enabled skills, kept skills,
and whether truncation happened
---------
Co-authored-by: Matthew Zeng <mzeng@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
- trust-gate project `.codex` layers consistently, including repos that
have `.codex/hooks.json` or `.codex/execpolicy/*.rules` but no
`.codex/config.toml`
- keep disabled project layers in the config stack so nested trusted
project layers still resolve correctly, while preventing hooks and exec
policies from loading until the project is trusted
- update app-server/TUI onboarding copy to make the trust boundary
explicit and add regressions for loader, hooks, exec-policy, and
onboarding coverage
## Security
Before this change, an untrusted repo could auto-load project hooks or
exec policies from `.codex/` as long as `config.toml` was absent. This
makes trust the single gate for project-local config, hooks, and exec
policies.
## Stack
- Parent of #15936
## Test
- cargo test -p codex-core without_config_toml
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Update the plugin API for the new remote plugin model.
The mental model is no longer “keep local plugin state in sync with
remote.” Instead, local and remote plugins are becoming separate
sources. Remote catalog entries can be shown directly from the remote
API before installation; after installation they are still downloaded
into the local cache for execution, but remote installed state will come
from the API and be held in memory rather than being read from config.
• ## API changes
- Remove `forceRemoteSync` from `plugin/list`, `plugin/install`, and
`plugin/uninstall`.
- Remove `remoteSyncError` from `plugin/list`.
- Add remote-capable metadata to `plugin/list` / `plugin/read`:
- nullable `marketplaces[].path`
- `source: { type: "remote", downloadUrl }`
- URL asset fields alongside local path fields:
`composerIconUrl`, `logoUrl`, `screenshotUrls`
- Make `plugin/read` and `plugin/install` source-compatible:
- `marketplacePath?: AbsolutePathBuf | null`
- `remoteMarketplaceName?: string | null`
- exactly one source is required at runtime
## Why
The large Rust test suites are slow and include some of our flakiest
tests, so we want to run them with Bazel native sharding while keeping
shard membership stable between runs.
This is the simpler follow-up to the explicit-label experiment in
#17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which
includes the stable test-name hashing support from
hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel
macros into that support.
Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's
per-shard test action caching. Using stable name hashing avoids
reshuffling every test when one test is added or removed.
## What Changed
`codex_rust_crate` now accepts `test_shard_counts` and applies the right
Bazel/rules_rust attributes to generated unit and integration test
rules. Matched tests are also marked `flaky = True`, giving them Bazel's
default three attempts.
This PR shards these labels 8 ways:
```text
//codex-rs/core:core-all-test
//codex-rs/core:core-unit-tests
//codex-rs/app-server:app-server-all-test
//codex-rs/app-server:app-server-unit-tests
//codex-rs/tui:tui-unit-tests
```
## Verification
`bazel query --output=build` over the selected public labels and their
inner unit-test binaries confirmed the expected `shard_count = 8`,
`flaky = True`, and `experimental_enable_sharding = True` attributes.
Also verified that we see the shards as expected in BuildBuddy so they
can be analyzed independently.
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add first-class marketplace support for git-backed plugin sources
- keep the newer marketplace parsing behavior from `main`, including
alternate manifest locations and string local sources
- materialize remote plugin sources during install, detail reads, and
non-curated cache refresh
- expose git plugin source metadata through the app-server protocol
## Details
This teaches the marketplace parser to accept all of the following:
- local string sources such as `"source": "./plugins/foo"`
- local object sources such as
`{"source":"local","path":"./plugins/foo"}`
- remote repo-root sources such as
`{"source":"url","url":"https://github.com/org/repo.git"}`
- remote subdir sources such as
`{"source":"git-subdir","url":"owner/repo","path":"plugins/foo","ref":"main","sha":"..."}`
It also preserves the newer tolerant behavior from `main`: invalid or
unsupported plugin entries are skipped instead of breaking the whole
marketplace.
## Validation
- `cargo test -p codex-core plugins::marketplace::tests`
- `just fix -p codex-core`
- `just fmt`
## Notes
- A full `cargo test -p codex-core` run still hit unrelated existing
failures in agent and multi-agent tests during this session; the
marketplace-focused suite passed after the rebase resolution.
Follow-up to https://github.com/openai/codex/pull/18178, where we called
out enabling the await-holding lint as a follow-up.
The long-term goal is to enable Clippy coverage for async guards held
across awaits. This PR is intentionally only the first, low-risk cleanup
pass: it narrows obvious lock guard lifetimes and leaves
`codex-rs/Cargo.toml` unchanged so the lint is not enabled until the
remaining cases are fixed or explicitly justified. It intentionally
leaves the active-turn/turn-state locking pattern alone because those
checks and mutations need to stay atomic.
## Common fixes used here
These are the main patterns reviewers should expect in this PR, and they
are also the patterns to reach for when fixing future `await_holding_*`
findings:
- **Scope the guard to the synchronous work.** If the code only needs
data from a locked value, move the lock into a small block, clone or
compute the needed values, and do the later `.await` after the block.
- **Use direct one-line mutations when there is no later await.** Cases
like `map.lock().await.remove(&id)` are acceptable when the guard is
only needed for that single mutation and the statement ends before any
async work.
- **Drain or clone work out of the lock before notifying or awaiting.**
For example, the JS REPL drains pending exec senders into a local vector
and the websocket writer clones buffered envelopes before it serializes
or sends them.
- **Use a `Semaphore` only when serialization is intentional across
async work.** The test serialization guards intentionally span awaited
setup or execution, so using a semaphore communicates "one at a time"
without holding a mutex guard.
- **Remove the mutex when there is only one owner.** The PTY stdin
writer task owns `stdin` directly; the old `Arc<Mutex<_>>` did not
protect shared access because nothing else had access to the writer.
- **Do not split locks that protect an atomic invariant.** This PR
deliberately leaves active-turn/turn-state paths alone because those
checks and mutations need to stay atomic. Those cases should be fixed
separately with a design change or documented with `#[expect]`.
## What changed
- Narrow scoped async mutex guards in app-server, JS REPL, network
approval, remote-control websocket, and the RMCP test server.
- Replace test-only async mutex serialization guards with semaphores
where the guard intentionally lives across async work.
- Let the PTY pipe writer task own stdin directly instead of wrapping it
in an async mutex.
## Verification
- `just fix -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness`
- `just clippy -p codex-core`
- `cargo test -p codex-core -p codex-app-server -p codex-rmcp-client -p
codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` was
run; the app-server suite passed, and `codex-core` failed in the local
sandbox on six otel approval tests plus
`suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
which appear to depend on local command approval/default rules and
`CODEX_SANDBOX_NETWORK_DISABLED=1` in this environment.
## Summary
First PR in the split from #17956.
- adds the core/app-server `RateLimitReachedType` shape
- maps backend `rate_limit_reached_type` into Codex rate-limit snapshots
- carries the field through app-server notifications/responses and
generated schemas
- updates existing constructors/tests for the new optional field
## Validation
- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
## Summary
- Normalize deferred MCP and dynamic tools into `ToolSearchEntry` values
before constructing `ToolSearchHandler`.
- Move the tool-search entry adapter out of `tools/handlers` and into
`tools/tool_search_entry.rs` so the handlers directory stays focused on
handlers.
- Keep `ToolSearchHandler` operating over one generic entry list for
BM25 search, namespace grouping, and per-bucket default limits.
## Why
Follow-up cleanup for #17849. The dynamic tool-search support made the
handler juggle source-specific MCP and dynamic tool lists, index
arithmetic, output conversion, and namespace emission. This keeps source
adaptation outside the handler so the search loop itself is smaller and
source-agnostic.
## Validation
- `just fmt`
- `cargo test -p codex-core tools::handlers::tool_search::tests`
- `git diff --check`
- `cargo test -p codex-core` currently fails in unrelated
`plugins::manager::tests::list_marketplaces_ignores_installed_roots_missing_from_config`;
rerunning that single test fails the same way at
`core/src/plugins/manager_tests.rs:1692`.
---------
Co-authored-by: pash <pash@openai.com>
Summary
- replace the thread/read persisted-load helper with
ThreadStore::read_thread
- move SQLite/rollout summary, name, fork metadata, and history loading
for persisted reads into LocalThreadStore
- leave getConversationSummary unchanged for a later PR
Context
- Replaces closed stacked PR #18232 after PR #18231 merged and its base
branch was deleted.