## Summary
- Keep the guardian policy installed as guardian base instructions.
- Clear inherited parent `developer_instructions` for guardian review
sessions.
- Update guardian config tests to assert developer instructions are
cleared and policy text is sourced from base instructions.
## Why
Guardian review sessions are intended to run under an isolated guardian
policy. Because the guardian config is cloned from the parent config,
inherited custom or managed developer instructions could otherwise
remain active and conflict with guardian review behavior.
## Validation
- `just fmt`
- `cargo test -p codex-core guardian_review_session_config`
Co-authored-by: Codex <noreply@openai.com>
## Why
A [Windows Cargo
build](https://github.com/openai/codex/actions/runs/24754807756/job/72425641062)
on `main` timed out in several unrelated-looking suites at the same
time:
- `codex-app-server` account tests failed before account logic, while
`mcp.initialize()` was waiting for the first JSON-RPC response.
- `codex-core` `apply_patch_cli` tests timed out while running full
Codex/apply_patch turns.
- `codex-windows-sandbox` legacy session tests timed out while creating
restricted-token child processes and private desktops.
The app-server log reached the test harness write path in
[`McpProcess::initialize_with_params`](731b54d08f/codex-rs/app-server/tests/common/mcp_process.rs (L244-L263)),
but never printed the matching stdout read from
[`read_jsonrpc_message`](731b54d08f/codex-rs/app-server/tests/common/mcp_process.rs (L1123-L1128)).
The server initialize handler is a small bookkeeping/response path
([`message_processor.rs`](731b54d08f/codex-rs/app-server/src/message_processor.rs (L601-L728))),
so the failure looks like Windows runner process/pipe scheduling
starvation rather than account-specific behavior.
## What Changed
This updates `.config/nextest.toml` to serialize two process-heavy sets:
- `codex-core` tests matching `package(codex-core) & kind(test) &
test(apply_patch_cli)`
- `codex-windows-sandbox` tests matching `package(codex-windows-sandbox)
& test(legacy_)`
`codex-app-server` integration tests were already serialized inside
their own package; this change reduces overlap with the other suites
that were saturating the runner at the same time.
## Verification
- `cargo nextest list --filterset "package(codex-core) & kind(test) &
test(apply_patch_cli)"`
- `cargo nextest list --filterset "package(codex-windows-sandbox) &
test(legacy_)"`
The Windows sandbox filter naturally lists no tests on macOS, but it
validates the nextest filter/config syntax locally.
## Why
The exec-server still needs platform sandbox inputs, but the migration
should preserve the `PermissionProfile` that produced them. Keeping only
the derived legacy sandbox map would keep `SandboxPolicy` as the
effective abstraction and would make full-disk vs. restricted profiles
harder to preserve as the permissions stack starts round-tripping
profiles.
`PermissionProfile` entries can also be cwd-sensitive (`:cwd`,
`:project_roots`, relative globs), so the exec-server must carry the
request sandbox cwd instead of resolving those entries against the
long-lived exec-server process cwd.
## What changed
`FileSystemSandboxContext` now carries `permissions: PermissionProfile`
plus an optional `cwd`:
- removed `sandboxPolicy`, `sandboxPolicyCwd`,
`fileSystemSandboxPolicy`, and `additionalPermissions`
- added `permissions` and `cwd`
- kept the platform knobs `windowsSandboxLevel`,
`windowsSandboxPrivateDesktop`, and `useLegacyLandlock`
Core turn and apply-patch paths populate the context from the active
runtime permissions and request cwd. Exec-server derives platform
`SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary,
adds helper runtime reads there, and rejects cwd-dependent profiles that
arrive without a cwd.
The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor
now preserves the old workspace-write conversion semantics for
compatibility tests/callers.
## Verification
- `cargo test -p codex-exec-server`
- `cargo test -p codex-exec-server sandbox_cwd -- --nocapture`
- `cargo test -p codex-exec-server
sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths
-- --nocapture`
- `cargo test -p codex-core --lib
file_system_sandbox_context_uses_active_attempt -- --nocapture`
## Why
A Mac Bazel CI run saw `remote_notifications_arrive_over_websocket` fail
during shutdown with `remote app-server shutdown channel is closed`
(https://app.buildbuddy.io/invocation/9dac05d6-ae20-40f9-b627-fca6e91cf127).
The remote websocket worker can legitimately finish while `shutdown()`
is waiting for the shutdown acknowledgement: after the test server sends
a notification and exits, the worker may deliver the required disconnect
event, observe that the caller has dropped the event receiver, and exit
before it sends the shutdown one-shot.
That state is already terminal cleanup, not a failed shutdown, so
callers should not see a `BrokenPipe` from the acknowledgement channel.
## What Changed
- Treat a closed remote shutdown acknowledgement as an already-exited
worker while still propagating websocket close errors when the worker
returns them.
- Added a deterministic regression test for the interleaving where the
shutdown command is received and the worker exits before replying.
## Verification
- `cargo test -p codex-app-server-client`
- New test:
`remote::tests::shutdown_tolerates_worker_exit_after_command_is_queued`
Add a temporary internal remote_plugin feature flag that merges remote
marketplaces into plugin/list and routes plugin/read through the remote
APIs when needed, while keeping pure local marketplaces working as
before.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Add first-class Amazon Bedrock Mantle provider support so Codex can keep
using its existing Responses API transport with OpenAI-compatible
AWS-hosted endpoints such as AOA/Mantle.
This is needed for the AWS launch path, where provider traffic should
authenticate with AWS credentials instead of OpenAI bearer credentials.
Requests are authenticated immediately before transport send, so SigV4
signs the final method, URL, headers, and body bytes that `reqwest` will
send.
## What Changed
- Added a new `codex-aws-auth` crate for loading AWS SDK config,
resolving credentials, and signing finalized HTTP requests with AWS
SigV4.
- Added a built-in `amazon-bedrock` provider that targets Bedrock Mantle
Responses endpoints, defaults to `us-east-1`, supports region/profile
overrides, disables WebSockets, and does not require OpenAI auth.
- Added Amazon Bedrock auth resolution in `codex-model-provider`: prefer
`AWS_BEARER_TOKEN_BEDROCK` when set, otherwise use AWS SDK credentials
and SigV4 signing.
- Added `AuthProvider::apply_auth` and `Request::prepare_body_for_send`
so request-signing providers can sign the exact outbound request after
JSON serialization/compression.
- Determine the region by taking the `aws.region` config first (required
for bearer token codepath), and fallback to SDK default region.
## Testing
Amazon Bedrock Mantle Responses paths:
- Built the local Codex binary with `cargo build`.
- Verified the custom proxy-backed `aws` provider using `env_key =
"AWS_BEARER_TOKEN_BEDROCK"` streamed raw `responses` output with
`response.output_text.delta`, `response.completed`, and `mantle-env-ok`.
- Verified a full `codex exec --profile aws` turn returned
`mantle-env-ok`.
- Confirmed the custom provider used the bearer env var, not AWS profile
auth: bogus `AWS_PROFILE` still passed, empty env var failed locally,
and malformed env var reached Mantle and failed with `401
invalid_api_key`.
- Verified built-in `amazon-bedrock` with `AWS_BEARER_TOKEN_BEDROCK` set
passed despite bogus AWS profiles, returning `amazon-bedrock-env-ok`.
- Verified built-in `amazon-bedrock` SDK/SigV4 auth passed with
`AWS_BEARER_TOKEN_BEDROCK` unset and temporary AWS session env
credentials, returning `amazon-bedrock-sdk-env-ok`.
## Why
`build_prompt_input` now initializes `ExecServerRuntimePaths`, which
requires a configured Codex executable path. The previous inline unit
test in `core/src/prompt_debug.rs` built a bare `test_config()` and then
failed before it could assert anything useful:
```text
Codex executable path is not configured
```
This coverage is also integration-shaped: it drives the public
`build_prompt_input` entry point through config, thread, and session
setup rather than testing a small internal helper in isolation.
Bazel CI did not catch this earlier because the affected test was behind
the same wrapped Rust unit-test path fixed by #18913. Before that
launcher/sharding fix, the outer `workspace_root_test` changed the
working directory for Insta compatibility while the inner `rules_rust`
sharding wrapper still expected its runfiles working directory. In
practice, Bazel could report success without executing the Rust test
cases in that shard. Once #18913 makes the wrapper run the Rust test
binary directly and shard with libtest arguments, this stale unit test
actually runs and exposes the missing `codex_self_exe` setup.
## What Changed
- Moved `build_prompt_input_includes_context_and_user_message` out of
`core/src/prompt_debug.rs`.
- Added `core/tests/suite/prompt_debug_tests.rs` and registered it from
`core/tests/suite/mod.rs`.
- Builds the test config with `ConfigBuilder` and provides
`codex_self_exe` using the current test executable, matching the
runtime-path invariant required by prompt debug setup.
- Preserves the existing assertions that the generated prompt input
includes both the debug user message and project-specific user
instructions.
## Verification
- `cargo test -p codex-core --test all
prompt_debug_tests::build_prompt_input_includes_context_and_user_message`
- `bazel test //codex-rs/core:core-all-test
--test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message
--test_output=errors`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916).
* #18913
* __->__ #18916
Fixes https://github.com/openai/codex/issues/16732.
## Why
`apply_patch` is Codex's primary file edit path, but it was not emitting
`PreToolUse` or `PostToolUse` hook events. That meant hook-based policy,
auditing, and write coordination could observe shell commands while
missing the actual file mutation performed by `apply_patch`.
The issue also exposed that the hook runtime serialized command hook
payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch`
supplied hook payloads, hooks would either fail to match it directly or
receive misleading stdin that identified the edit as a Bash tool call.
## What Changed
- Added `PreToolUse` and `PostToolUse` payload support to
`ApplyPatchHandler`.
- Exposed the raw patch body as `tool_input.command` for both
JSON/function and freeform `apply_patch` calls.
- Taught tool hook payloads to carry a handler-supplied hook-facing
`tool_name`.
- Preserved existing shell compatibility by continuing to emit `Bash`
for shell-like tools.
- Serialized the selected hook `tool_name` into hook stdin instead of
hardcoding `Bash`.
- Relaxed the generated hook command input schema so `tool_name` can
represent tools other than `Bash`.
## Verification
Added focused handler coverage for:
- JSON/function `apply_patch` calls producing a `PreToolUse` payload.
- Freeform `apply_patch` calls producing a `PreToolUse` payload.
- Successful `apply_patch` output producing a `PostToolUse` payload.
- Shell and `exec_command` handlers continuing to expose `Bash`.
Added end-to-end hook coverage for:
- A `PreToolUse` hook matching `^apply_patch$` blocking the patch before
the target file is created.
- A `PostToolUse` hook matching `^apply_patch$` receiving the patch
input and tool response, then adding context to the follow-up model
request.
- Non-participating tools such as the plan tool continuing not to emit
`PreToolUse`/`PostToolUse` hook events.
Also validated manually with a live `codex exec` smoke test using an
isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed
that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with
`tool_name: "apply_patch"`, a shell command still emits `tool_name:
"Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file
from being created.
## Summary
- add experimental turn/start.environments params for per-turn
environment id + cwd selections
- pass selections through core protocol ops and resolve them with
EnvironmentManager before TurnContext creation
- treat omitted selections as default behavior, empty selections as no
environment, and non-empty selections as first environment/cwd as the
turn primary
## Testing
- ran `just fmt`
- ran `just write-app-server-schema`
- not run: unit tests for this stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
#18275 anchors session-scoped `:cwd` and `:project_roots` grants to the
request cwd before recording them for reuse. Relative deny glob entries
need the same treatment. Without anchoring, a stored session permission
can keep a pattern such as `**/*.env` relative, then reinterpret that
deny against a later turn cwd. That makes the persisted profile depend
on the cwd at reuse time instead of the cwd that was reviewed and
approved.
## What changed
`intersect_permission_profiles` now materializes retained
`FileSystemPath::GlobPattern` entries against the request cwd, matching
the existing materialization for cwd-sensitive special paths.
Materialized accepted grants are now deduplicated before deny retention
runs. This keeps the sticky-grant preapproval shape stable when a
repeated request is merged with the stored grant and both `:cwd = write`
and the materialized absolute cwd write are present.
The preapproval check compares against the same materialized form, so a
later request for the same cwd-relative deny glob still matches the
stored anchored grant instead of re-prompting or rejecting.
Tests cover both the storage path and the preapproval path: a
session-scoped `:cwd = write` grant with `**/*.env = none` is stored
with both the cwd write and deny glob anchored to the original request
cwd, cannot be reused from a later cwd, and remains preapproved when
re-requested from the original cwd after merging with the stored grant.
## Verification
- `cargo test -p codex-sandboxing policy_transforms`
- `cargo test -p codex-core --lib
relative_deny_glob_grants_remain_preapproved_after_materialization`
- `cargo clippy -p codex-sandboxing --tests -- -D
clippy::redundant_clone`
- `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* __->__ #18867
## Summary
- tighten the external migration prompt snapshot around stable synthetic
fixture text
- add focused display_description tests for relative path rewriting and
plugin summaries
- split the path-format assertions into smaller, easier-to-read unit
tests
## Why
The previous prompt snapshot was coupled to path text that came from
detected migration items, which made it noisier and more brittle than
necessary. This change keeps the snapshot focused on stable UI structure
and moves dynamic path formatting checks into targeted unit tests.
## Validation
- cargo test -p codex-tui external_agent_config_migration::tests::
- cargo test -p codex-tui
external_agent_config_migration::tests::display_description_
- just fmt
## Notes
Per the repo instructions, I did not rerun tests after the final `just
fmt` pass.
This change aligns the `/statusline` and `/title` UIs around the same
normalized item model so both surfaces use consistent ids, labels, and
preview semantics. It keeps the shared preview work from #18435 ,
tightens the remaining mismatches by standardizing item naming, expands
title/status item coverage where appropriate, and makes `/title` preview
use the same title-specific formatting path as the real rendered
terminal title.
- Normalizes persisted item ids and keeps legacy aliases for
compatibility
- Aligns `status-line` and `terminal-title` items with the shared
preview model
- Routes `terminal-title` preview through title-specific formatting and
truncation
- Updates the affected status/title setup snapshots
Added to `/statusline`:
- status
- task-progress
Normalized in `/statusline`:
- model-name -> model
- project-root -> project-name
Added to `/title`:
- current-dir
- context-remaining
- context-used
- five-hour-limit
- weekly-limit
- codex-version
- used-tokens
- total-input-tokens
- total-output-tokens
- session-id
- fast-mode
- model-with-reasoning
Normalized in `/title`:
- project -> project-name
- thread -> thread-title
- model-name -> model
## Summary
Allow guardian to skip other fields and output only
`{"outcome":"allow"}` when the command is low risk.
This change lets guardian reviews use a non-strict text format while
keeping the JSON schema itself as plain user-visible schema data, so
transport strictness is carried out-of-band instead of through a schema
marker key.
## What changed
- Add an explicit `output_schema_strict` flag to model prompts and pass
it into `codex-api` text formatting.
- Set guardian reviewer prompts to non-strict schema validation while
preserving strict-by-default behavior for normal callers.
- Update the guardian output contract so definitely-low-risk decisions
may return only `{"outcome":"allow"}`.
- Treat bare allow responses as low-risk approvals in the guardian
parser.
- Add tests and snapshots covering the non-strict guardian request and
optional guardian output fields.
## Verification
- `cargo test -p codex-core guardian::tests::guardian`
- `cargo test -p codex-core guardian::tests::`
- `cargo test -p codex-core client_common::tests::`
- `cargo test -p codex-protocol
user_input_serialization_includes_final_output_json_schema`
- `cargo test -p codex-api`
- `git diff --check`
Note: `cargo test -p codex-core` was also attempted, but this desktop
environment injects ambient config/proxy state that causes unrelated
config/session tests expecting pristine defaults to fail.
---------
Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
Adds the standalone `codex-rollout-trace` crate, which defines the raw
trace event format, replay/reduction model, writer, and reducer logic
for reconstructing model-visible conversation/runtime state from
recorded rollout data.
The crate-level design is documented in
[`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).
## Stack
This is PR 1/5 in the rollout trace stack.
- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command
## Review Notes
This PR intentionally does not wire tracing into live Codex execution.
It establishes the data model and reducer contract first, with
crate-local tests covering conversation reconstruction, compaction
boundaries, tool/session edges, and code-cell lifecycle reduction. Later
PRs emit into this model.
The README is the best entry point for reviewing the intended trace
format and reduction semantics before diving into the reducer modules.
## Summary
- Adds a process-local, in-memory cookie store for ChatGPT HTTP clients.
- Limits cookie storage and replay to a shared ChatGPT host allowlist.
- Wires the shared store into the default Codex reqwest client and
backend client.
- Shares the ChatGPT host allowlist with remote-control URL validation
to avoid drift.
- Enables reqwest cookie support and updates lockfiles.
## Summary
This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes
It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.
This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.
## Stack
1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider
## Testing
Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
## Why
The device-key protocol needs an app-server implementation that keeps
local key operations behind the same request-processing boundary as
other v2 APIs.
app-server owns request dispatch, transport policy, documentation, and
JSON-RPC error shaping. `codex-device-key` owns key binding, validation,
platform provider selection, and signing mechanics. Keeping the adapter
thin makes the boundary easier to review and avoids moving local
key-management details into thread orchestration code.
## What changed
- Added `DeviceKeyApi` as the app-server adapter around
`DeviceKeyStore`.
- Converted protocol protection policies, payload variants, algorithms,
and protection classes to and from the device-key crate types.
- Encoded SPKI public keys and DER signatures as base64 protocol fields.
- Routed `device/key/create`, `device/key/public`, and `device/key/sign`
through `MessageProcessor`.
- Rejected remote transports before provider access while allowing local
`stdio` and in-process callers to reach the device-key API.
- Added stdio, in-process, and websocket tests for device-key validation
and transport policy.
- Documented the device-key methods in the app-server v2 method list.
## Test coverage
- `device_key_create_rejects_empty_account_user_id`
- `in_process_allows_device_key_requests_to_reach_device_key_api`
- `device_key_methods_are_rejected_over_websocket`
## Stack
This is PR 3 of 4 in the device-key app-server stack. It is stacked on
#18429.
## Validation
- `cargo test -p codex-app-server device_key`
- `just fix -p codex-app-server`
## Summary
Adds main-chat shortcuts for changing reasoning effort one step at a
time:
- `Alt+,` lowers reasoning (has the `<` arrow on the key)
- `Alt+.` raises reasoning (similarly, has the `>` arrow)
The shortcut updates the active session only. It does not persist the
selected reasoning level as the default for future sessions. In Plan
mode, it applies temporarily to Plan mode without opening the
global-vs-Plan scope prompt.
## Details
The shortcut uses the active model preset to decide which reasoning
levels are valid. If the current session has no explicit reasoning
effort, it starts from the model default. Each keypress moves to the
next supported level in the requested direction.
The shortcut only runs from the main chat surface. If a popup or modal
is open, input remains owned by that UI.
In Plan mode, the shortcut updates the in-memory Plan reasoning override
directly. The model/reasoning picker still keeps the existing scope
prompt for explicit picker changes.
## Notes
Ctrl-plus and Ctrl-minus were considered, but terminals do not deliver
those combinations consistently, so this PR uses Alt shortcuts instead.
If the current effort is unsupported by the selected model, the shortcut
skips to the nearest supported level in the requested direction. If
there is no valid step, it shows the existing boundary message.
## Tests
- `cargo test -p codex-tui reasoning_shortcuts`
- `cargo test -p codex-tui reasoning_effort`
- `cargo test -p codex-tui reasoning_shortcut`
- `cargo test -p codex-tui footer_snapshots`
- `cargo test -p codex-tui`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests`
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## Why
PR #18431 exposed a Bazel clippy failure in the app-server unit-test
target across Linux, macOS, and Windows. The failing lint was
`clippy::await_holding_invalid_type`: two tracing tests serialized
access to global tracing state by holding a `tokio::sync::MutexGuard`
across awaited test work.
That serialization is still needed because the tests share
process-global tracing setup and exporter state, but it should not
require holding an async mutex guard through the whole test body.
## What changed
- Replaced the bespoke async `tracing_test_guard` helper with
`serial_test` on the two tracing tests that need global tracing
serialization.
- Removed the `#[expect(clippy::await_holding_invalid_type)]`
annotations and the lock guard callsites that Bazel clippy rejected.
## Validation
- `cargo test -p codex-app-server jsonrpc_span`
- `just fix -p codex-app-server`
- `git diff --check`
I also attempted the exact failing Bazel clippy target locally with
BuildBuddy disabled: `bazel --noexperimental_remote_repo_contents_cache
build --config=clippy --bes_backend= --remote_cache=
--experimental_remote_downloader= --
//codex-rs/app-server:app-server-unit-tests-bin`. That run did not reach
clippy because Bazel timed out downloading `libcap-2.27.tar.gz` from
`kernel.org`.
## Why
Device-key storage and signing are local security-sensitive operations
with platform-specific behavior. Keeping the core API in
`codex-device-key` keeps app-server focused on routing and business
logic instead of owning key-management details.
The crate keeps the signing surface intentionally narrow: callers can
create a bound key, fetch its public key, or sign one of the structured
payloads accepted by the crate. It does not expose a generic
arbitrary-byte signing API.
Key IDs cross into platform-specific labels, tags, and metadata paths,
so externally supplied IDs are constrained to the same auditable
namespace created by the crate: `dk_` followed by unpadded base64url for
32 bytes. Remote-control target paths are also tied to each signed
payload shape so connection proofs cannot be reused for enrollment
endpoints, or vice versa.
## What changed
- Added the `codex-device-key` workspace crate.
- Added account/client-bound key creation with stable `dk_` key IDs.
- Added strict `key_id` validation before public-key lookup or signing
reaches a provider.
- Added public-key lookup and structured signing APIs.
- Split remote-control client endpoint allowlists by connection vs
enrollment payload shape.
- Added validation for key bindings, accepted payload fields, token
expiration, and payload/key binding mismatches.
- Added flow-oriented docs on the validation helpers that gate provider
signing.
- Added protection policy and protection-class types without wiring a
platform provider yet.
- Added an unsupported default provider so platforms without an
implementation fail explicitly instead of silently falling back to
software-backed keys.
- Updated Cargo and Bazel lock metadata for the new crate and
non-platform-specific dependencies.
## Stack
This is stacked on #18428.
## Validation
- `cargo test -p codex-device-key`
- Added unit coverage for strict `key_id` validation before provider
use.
- Added unit coverage that rejects remote-control paths from the wrong
signed payload shape.
- `just bazel-lock-update`
- `just bazel-lock-check`
## Summary
This is the runtime/foundation half of the Windows sandbox unified-exec
work.
- add Windows sandbox `unified_exec` session support in
`windows-sandbox-rs` for both:
- the legacy restricted-token backend
- the elevated runner backend
- extend the PTY/process runtime so driver-backed sessions can support:
- stdin streaming
- stdout/stderr separation
- exit propagation
- PTY resize hooks
- add Windows sandbox runtime coverage in `codex-windows-sandbox` /
`codex-utils-pty`
This PR does **not** enable Windows sandbox `UnifiedExec` for product
callers yet because hooking this up to app-server comes in the next PR.
Windows sandbox advertising is intentionally kept aligned with `main`,
so sandboxed Windows callers still fall back to `ShellCommand`.
This PR isolates the runtime/session layer so it can be reviewed
independently from product-surface enablement.
---------
Co-authored-by: jif-oai <jif@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Why
Permission approval responses must not be able to grant more access than
the tool requested. Moving this flow to `PermissionProfile` means the
comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
cwd-relative special paths such as `:cwd` and `:project_roots` must stay
anchored to the turn that produced the request.
## What changed
This implements semantic `PermissionProfile` intersection in
`codex-sandboxing` for file-system and network permissions. The
intersection accepts narrower path grants, rejects broader grants,
preserves deny-read carve-outs and glob scan depth, and materializes
cwd-dependent special-path grants to absolute paths before they can be
recorded for reuse.
The request-permissions response paths now use that intersection
consistently. App-server captures the request turn cwd before waiting
for the client response, includes that cwd in the v2 approval params,
and core stores the requested profile plus cwd for direct TUI/client
responses and Guardian decisions before recording turn- or
session-scoped grants. The TUI app-server bridge now preserves the
app-server request cwd when converting permission approval params into
core events.
## Verification
- `cargo test -p codex-sandboxing intersect_permission_profiles --
--nocapture`
- `cargo test -p codex-app-server request_permissions_response --
--nocapture`
- `cargo test -p codex-core
request_permissions_response_materializes_session_cwd_grants_before_recording
-- --nocapture`
- `cargo check -p codex-tui --tests`
- `cargo check --tests`
- `cargo test -p codex-tui
app_server_request_permissions_preserves_file_system_permissions`
## Summary
The TUI app refactor in #18753 moved the old `app.rs` tests into a
single `app/tests.rs` file. That kept the split mechanically simple, but
it left several focused unit tests far from the modules they exercise.
This PR is a follow-up that moves tests next to the code they cover.
It also adds `tui/src/app/test_support.rs` for shared fixture
construction.
This is just a mechanical refactoring (no functional changes) and does
not affect any production code.
## Why
`debug_clear_memories_resets_state_and_removes_memory_dir` can be flaky
because the test drops its `sqlx::SqlitePool` immediately before
invoking `codex debug clear-memories`. Dropping the pool does not wait
for all SQLite connections to close, so the CLI can race with still-open
test connections.
## What changed
- Await `pool.close()` before spawning `codex debug clear-memories`.
- Close the reopened verification pool before the temp `CODEX_HOME` is
torn down.
## Verification
- `cargo test -p codex-cli --test debug_clear_memories
debug_clear_memories_resets_state_and_removes_memory_dir`
Fixes#17954.
## Why
When a manual shell command like `!sleep 10` is running, submitting
plain text such as `hi` currently sends that text as a steer for the
active shell turn. User shell turns are not steerable like model turns,
so the TUI can remain stuck in `Working` after the shell command
finishes.
## What Changed
- Detect when the only active work is one or more
`ExecCommandSource::UserShell` commands.
- Queue plain submitted input in that state so it drains after the shell
command and shell turn complete.
- Preserve `!cmd` submissions during running work so explicit shell
commands keep their existing behavior.
- Add regression coverage for the `!sleep 10` plus `hi` flow in
`chatwidget::tests::exec_flow::user_message_during_user_shell_command_is_queued_not_steered`.
## Verification
- Manually confirmed hang before the fix and no hang after the fix
## Summary
- wrap OSC 9 notifications in tmux's DCS passthrough so terminal
notifications make it through tmux
- use codex-terminal-detection for OSC 9 auto-selection so tmux sessions
inherit the underlying client terminal support
- add focused notification backend tests for plain OSC 9 and
tmux-wrapped output
## Stack
- base PR: #18479
- review order: #18479, then this PR
## Why
Tmux does not forward OSC 9 notifications directly; the sequence has to
be wrapped in tmux's DCS passthrough envelope. Codex also had local
notification heuristics that could miss supported terminals when running
under tmux, even though codex-terminal-detection already knows how to
attribute tmux sessions to the client terminal.
## Validation
- `just fmt`
- `cargo test -p codex-tui` *(currently blocked by an unrelated existing
compile error in `app-server/src/message_processor.rs:754` referencing
`connection_id` out of scope; not caused by this change)*
Co-authored-by: Codex <noreply@openai.com>
## Summary
- attach the authoritative Codex thread id to MCP tool request
`_meta.threadId` for model-initiated tool calls
- attach the same thread id for manual `mcpServer/tool/call` requests
before invoking the MCP server
- cover both metadata helper behavior and the manual app-server MCP path
in tests
needed because the Rust app-server is the last place that still has
authoritative knowledge of “this model-generated MCP tool call belongs
to conversation/thread X” before the request leaves Codex and reaches
Hoopa. It adds threadId to MCP request metadata in the model-generated
tool-call path, using sess.conversation_id, and also does the same for
the manual mcpServer/tool/call path.
## Test plan
- `cargo test -p codex-core
mcp_tool_call_thread_id_meta_is_added_to_request_meta --lib`
- `cargo test -p codex-app-server
mcp_server_tool_call_returns_tool_result`
Paired Hoopa consumer PR: https://github.com/openai/openai/pull/833263
## Why
Clients need a stable app-server protocol surface for enrolling a local
device key, retrieving its public key, and producing a device-bound
proof.
The protocol reports `protectionClass` explicitly so clients can
distinguish hardware-backed keys from an explicitly allowed OS-protected
fallback. Signing uses a tagged `DeviceKeySignPayload` enum rather than
arbitrary bytes so each signed statement is auditable at the API
boundary.
## What changed
- Added v2 JSON-RPC methods for `device/key/create`,
`device/key/public`, and `device/key/sign`.
- Added request/response types for device-key metadata, SPKI public
keys, protection classes, and ECDSA signatures.
- Added `DeviceKeyProtectionPolicy` with hardware-only default behavior
and an explicit `allow_os_protected_nonextractable` option.
- Added the initial `remoteControlClientConnection` signing payload
variant.
- Regenerated JSON Schema and TypeScript fixtures for app-server
clients.
## Stack
This is PR 1 of 4 in the device-key app-server stack.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
The `test-log` crate is only used by `codex-core` tests, so it does not
need
to be part of the normal `codex-core` dependency graph. Keeping
`test-log` in
`dev-dependencies` removes it from normal `codex-core` builds and keeps
the
production dependency set a little smaller.
Verification:
- `cargo tree -p codex-core --edges normal --invert test-log`
- `cargo check -p codex-core --lib`
- `cargo test -p codex-core --lib`
This add with 2 entry point:
* `reset_git_repository` that takes a directory and set it as a new git
root
* `diff_since_latest_init` this returns the diff for a given directory
since the last `reset_git_repository`
## What changed
This PR makes the default Cargo dev profile use line-tables-only debug
info:
```toml
[profile.dev]
debug = 1
```
That keeps useful backtraces while avoiding the cost of full variable
debug
info in normal local dev builds.
This also makes the Bazel CI setting explicit with `-Cdebuginfo=0` for
target
and exec-configuration Rust actions. Bazel/rules_rust does not read
Cargo
profiles for this setting, and the current fastbuild action already
emitted
`--codegen=debuginfo=0`; the Bazel part of this PR makes that choice
direct in
our build configuration.
## Why
The slow codex-core rebuilds are dominated by debug-info codegen, not
parsing
or type checking. On a warm-dependency package rebuild, the baseline
codex-core compile was about 39.5s wall / 38.9s rustc total, with
codegen_crate around 14.0s and LLVM_passes around 13.4s. Setting
codex-core
to line-tables-only debug info brought that to about 27.2s wall / 26.7s
rustc
total, with codegen_crate around 3.1s and LLVM_passes around 2.8s.
`debug = 0` was only about another 0.7s faster than `debug = 1` in the
codex-core measurement, so `debug = 1` is the better default dev
tradeoff: it
captures nearly all of the compile-time win while preserving basic
debuggability.
I also sampled other first-party crates instead of keeping a
codex-core-only
package override. codex-app-server showed the same pattern: rustc total
dropped from 15.85s to 10.48s, while codegen_crate plus LLVM_passes
dropped
from about 13.47s to 3.23s. codex-app-server-protocol had a smaller but
still
real improvement, 16.05s to 14.58s total, and smaller crates showed
modest
wins. That points to a workspace dev-profile policy rather than a
hand-maintained list of large crates.
## Relationship to #18612
[#18612](https://github.com/openai/codex/pull/18612) added the
`dev-small`
profile. That remains useful when someone wants a working local build
quickly
and is willing to opt in with `cargo build --profile dev-small`.
This PR is deliberately less aggressive: it changes the common default
dev
profile while preserving line tables/backtraces. `dev-small` remains the
explicit "build quickly, no debuggability concern" path.
## Other investigation
I looked for another structural win comparable to
[#16631](https://github.com/openai/codex/pull/16631) and
[#16630](https://github.com/openai/codex/pull/16630), but did not find
one.
The attempted TOML monomorphization changes were noisy or worse in
measurement, and the async task changes reduced some instantiations but
only
translated to roughly a one-second improvement while being much more
disruptive. The debug-info setting was the one repeatable, material win
that
survived measurement.
## Verification
- `just bazel-lock-update`
- `just bazel-lock-check`
- `cargo check -p codex-core --lib`
- `cargo test -p codex-core --lib`
- Bazel `aquery --config=ci-linux` confirmed `--codegen=debuginfo=0` and
`-Cdebuginfo=0` for `//codex-rs/core:core`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18844).
* #18846
* __->__ #18844
## Summary
- Move external agent config migration logic and tests from `codex-core`
into `app-server/src/config`.
- Keep the migration service crate-private to app-server and update the
API adapter imports.
- Remove stale core re-exports and expose only the needed marketplace
source helper.
## Testing
- `cargo test -p codex-app-server config::external_agent_config`
- `just fmt`
- `just fix -p codex-app-server`
- `just fix -p codex-core`
- `git diff --check`
Fixes https://github.com/openai/codex/issues/13638
## Why
VS Code's integrated terminal can run a Linux shell through WSL without
exposing `TERM_PROGRAM` to the Linux process, and with crossterm
keyboard enhancement flags enabled that environment can turn dead-key
composition into malformed key events instead of composed Unicode input.
Codex already handles composed Unicode correctly, so the fix is to avoid
enabling the terminal mode that breaks this path for the affected
terminal combination.
## What Changed
- Automatically skip crossterm keyboard enhancement flags when Codex
detects WSL plus VS Code, including a Windows-side `TERM_PROGRAM` probe
through WSL interop.
- Add `CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT` so users can
force-disable or force-enable the keyboard enhancement policy for
diagnosis.
## Verification
- Added unit coverage for env parsing, VS Code detection, and the WSL/VS
Code auto-disable policy.
- `cargo check -p codex-tui` passed.
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests` passed.
- `cargo test -p codex-tui` was attempted locally, but the checkout
failed during linking before tests executed because V8 symbols from
`codex-code-mode` were unresolved for `arm64`.
## What
- Explicitly show our "bash mode" by changing the color and adding a
callout similar to how we do for `Plan mode (shift + tab to cycle)`
- Also replace our `›` composer prefix with a bang `!`

## Why
- It was unclear that we had a Bash mode
- This feels more responsive
- It looks cool!
---------
Co-authored-by: Codex <noreply@openai.com>
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.
This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.
It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.
Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
Follow-up to https://github.com/openai/codex/pull/18178, where we said
the await-holding clippy rule would be enabled separately.
Enable `await_holding_lock` and `await_holding_invalid_type` after the
preceding commits fixed or explicitly documented the current offenders.
## Why
This PR prepares the stack to enable Clippy await-holding lints that
were left disabled in #18178. The mechanical lock-scope cleanup is
handled separately; this PR is the documentation/configuration layer for
the remaining await-across-guard sites.
Without explicit annotations, reviewers and future maintainers cannot
tell whether an await-holding warning is a real concurrency smell or an
intentional serialization boundary.
## What changed
- Configures `clippy.toml` so `await_holding_invalid_type` also covers
`tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
- Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
...)]` annotations for intentional async guard lifetimes.
- Documents the main categories of intentional cases: active-turn state
transitions that must remain atomic, session-owned MCP manager accesses,
remote-control websocket serialization, JS REPL kernel/process
serialization, OAuth persistence, external bearer token refresh
serialization, and tests that intentionally serialize shared global or
session-owned state.
- For external bearer token refresh, documents the existing
serialization boundary: holding `cached_token` across the provider
command prevents concurrent cache misses from starting duplicate refresh
commands, and the current behavior is small enough that an explicit
expectation is easier to maintain than adding another synchronization
primitive.
## Verification
- `cargo clippy -p codex-login --all-targets`
- `cargo clippy -p codex-connectors --all-targets`
- `cargo clippy -p codex-core --all-targets`
- The follow-up PR #18698 enables `await_holding_invalid_type` and
`await_holding_lock` as workspace `deny` lints, so any undocumented
remaining offender will fail Clippy.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
* #18698
* __->__ #18423