## Why
Remote-control clients need to list and revoke controller-device grants
without enabling or enrolling the local relay. These are signed-in
account-management operations, so coupling them to websocket, pairing,
enrollment, or persisted relay state would prevent clients from managing
stale grants from the picker.
Related enhancement request: N/A. This adds the Codex app-server surface
for the planned upstream environment-scoped revoke endpoint.
## What Changed
- Added experimental app-server v2 RPCs:
- `remoteControl/client/list`
- `remoteControl/client/revoke`
- Added picker-oriented protocol types and standard generated schema
fixtures. The list response intentionally omits backend account id,
enrollment status, and location fields.
- Added `app-server-transport/src/transport/remote_control/clients.rs`
for environment-scoped GET and DELETE requests. It builds escaped URL
path segments, forwards optional pagination query fields, sends ChatGPT
auth plus `chatgpt-account-id`, converts RFC3339 `last_seen_at` values
to Unix seconds, accepts `204 No Content` revoke responses, and retries
once after a `401`.
- Extracted shared ChatGPT auth loading and recovery into
`app-server-transport/src/transport/remote_control/auth.rs` so
websocket, pairing, and client management use the same account-auth
boundary.
- Retained the configured remote-control base URL on
`RemoteControlHandle` and resolve management URLs lazily, preserving
deferred validation while relay startup is disabled.
- Registered list as `global_shared_read("remote-control-clients")` and
revoke as `global("remote-control-clients")`.
## Verification
- Added transport coverage proving list and revoke work while relay
state is disabled, IDs are escaped, picker-only fields are returned,
timestamps are converted, revoke accepts `204`, auth headers are
forwarded, `401` retries exactly once, `403` is not retried, and
malformed list payloads retain decode context.
- Added an app-server integration test proving both JSON-RPC methods
work before relay enablement and successful revoke returns `{}`.
- Regenerated and validated experimental and standard app-server schema
fixtures.
## Summary
Allow EDU ChatGPT workspaces to fetch cloud config bundles. The existing
cloud config eligibility gate only allowed business-like and enterprise
plans, which meant EDU admins could configure managed policies in the UI
but the Codex client would skip fetching them.
This keeps individual/pro and team-like usage-based plans excluded, and
adds service-level coverage for both `edu` and `education` plan aliases.
## Validation
- `just fmt`
- `just test -p codex-cloud-config`
- Built the Codex app locally, created a new EDU ChatGPT workspace, and
verified config bundles can be fetched and are properly applied.
## Disclaimer
Do not use for now
## Why
Extensions can already contribute prompt fragments and request same-turn
item injection, but there was no host-owned hook for contributing
structured `ResponseItem`s while Codex is assembling a new turn's
initial model input. This change adds that seam so extensions can attach
turn-local input that depends on the submitted user input and resolved
turn environments without routing through prompt text or late injection.
## What changed
- add `TurnInputContributor` to `codex_extension_api` and export the new
`TurnInputContext` / `TurnInputEnvironment` types it receives
- teach `ExtensionRegistry` to register and expose turn-input
contributors alongside the existing extension hooks
- call registered turn-input contributors from
`core/src/session/turn.rs` while building the initial injected input for
a turn, then append their returned `ResponseItem`s after the skill and
plugin injections
## Why
`PermissionProfile` is becoming the default way to represent Codex
permissions, but the implicit default behavior should stay the same for
now:
- trusted projects use `:workspace`
- untrusted projects also use `:workspace`
- roots without a trust decision use `:read-only`
- unsandboxed Windows falls back to `:read-only`
This keeps the existing sandbox semantics while making silent config
defaults observable as built-in permission profiles instead of treating
the legacy `SandboxPolicy` projection as the primary shape.
## What Changed
- Refactored legacy sandbox derivation to resolve the configured sandbox
mode once, then apply the implicit project fallback only when no sandbox
mode was configured.
- Preserved the existing trust-decision fallback: trusted and untrusted
projects default to workspace-write where supported.
- Added empty-config coverage asserting that an untrusted project
resolves to the built-in active permission profile (`:workspace` outside
unsandboxed Windows).
## Verification
- `just fmt`
- `just test -p codex-core 'config::'`
- `just test -p codex-config`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/25926).
* __->__ #25926
## Disclaimer
This is only here for iteration purpose! Do not make any code rely on
this
## Why
Skills still live behind `codex-core` discovery and injection paths, but
the extension system needs an authority-aware home before that logic can
move. This adds that boundary without changing current skills behavior,
and keeps host, executor, and remote skills distinct so future
list/read/search flows do not collapse back to ambient local paths.
## What changed
- Add the `codex-skills-extension` workspace/Bazel crate under
`ext/skills`.
- Define the initial catalog, authority, provider, and turn-state types
for authority-bound skill packages and resources.
- Register placeholder thread/config/prompt/turn lifecycle contributors
plus host, executor, and remote provider aggregation points.
- Capture the remaining extraction work as TODOs, including the missing
extension API hooks needed for per-turn catalog construction and typed
skill injection.
- Keep plugins outside the runtime skills model: plugin-installed skills
are treated as materialized host-owned skill sources once available.
## Verification
- Not run locally.
## Summary
- stop publishing Python runtime wheels as a side effect of Rust
releases
- publish runtime wheels from the Python SDK release workflow, either
explicitly before updating the SDK pin or immediately before a
`python-v*` SDK release
- resolve the runtime release from the requested version or the SDK
package's exact `openai-codex-cli-bin` pin
- build two musllinux-tagged wheels from the Rust-release Linux package
archives alongside the six existing runtime wheels
- validate SDK beta tags before any PyPI write
## Release configuration
- update the `openai-codex-cli-bin` PyPI trusted publisher to trust
`.github/workflows/python-sdk-release.yml` and the
`publish-python-runtime` job
## Pin update flow
- run the `python-sdk-release` workflow manually with the new runtime
version before opening or updating the SDK pin PR
- after the pin lands, a `python-v*` SDK tag republishes with
`skip-existing: true` before publishing the SDK package
## Validation
- ran `just fmt`
- validated the edited workflow YAML
- validated the embedded `publish-python-runtime` Bash with `bash -n`
- validated manual `0.136.0 -> rust-v0.136.0` mapping
- validated tag-driven `python-v0.1.0b3 -> 0.132.0 -> rust-v0.132.0`
mapping
- validated rejection of an invalid SDK tag before publication
- confirmed `rust-v0.136.0` contains the two required Linux package
archives
- CI will provide the full test signal
## Why
Standalone image generation remained top-level-only in code-mode
sessions.
## What changed
- Change imagegen exposure from `DirectModelOnly` to `Direct`.
- Keep direct-mode access while enabling nested code-mode access.
- Add a focused regression test for the exposure contract.
## Validation
- `just test -p codex-image-generation-extension`
## Why
`profile_sandbox_mode` was left over from the old selected legacy
profile path. Production now always derives permissions without that
value, and legacy profile contents are ignored, so keeping a parameter
that is always `None` makes `derive_permission_profile` look like it
still supports a fallback that no longer exists.
## What Changed
- Removed the `profile_sandbox_mode` argument from
`ConfigToml::derive_permission_profile`.
- Updated the production caller and legacy sandbox-policy test helper to
match.
- Dropped the stale unselected legacy-profile sandbox test that only
protected the removed fallback shape.
## Verification
- `just test -p codex-config`
- `just test -p codex-core 'config::'`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/25943).
* #25926
* __->__ #25943
## Stack
1. #25850 - Key request-permission grants by environment: stores and
applies sticky permission grants per environment id.
2. #25858 - Add `environmentId` to `request_permissions`: lets the model
target a selected environment and resolves relative permission paths
against it.
3. #25862 - Propagate permission approval environment id: carries the
selected environment id through approval events, app-server requests,
TUI prompts, and delegate forwarding.
4. This PR (#25867) - Add remote request permissions integration
coverage: verifies the selected remote environment across request,
approval, grant reuse, and exec.
This PR is stacked on #25862 and should be reviewed after #25850,
#25858, and #25862.
## Why
The environment-scoped permission stack needs one end-to-end check that
exercises the CCA-shaped path, not only unit-level parsing. This
verifies that a model-sent `environmentId` on `request_permissions`
reaches the approval event, stores the grant under the selected
environment, and is reused by a later tool call in that same
environment.
## What Changed
- Adds a remote executor integration test for `request_permissions` with
`environmentId: remote` and a relative write root.
- Asserts the permission event reports the remote environment and cwd,
and that the normalized grant resolves under the remote cwd.
- Approves the grant, then runs a remote `exec_command` without explicit
per-call permissions and verifies it completes without another exec
approval and writes only in the remote filesystem.
## Verification
- Not run locally per instruction.
- `git diff --check`
## Why
`code_mode_only` moved ordinary runtime tools behind `exec`, but it also
hid hosted Responses tools. Hosted `web_search` and `image_generation`
do not have a nested `exec` runtime path, so code-only sessions lost
those capabilities entirely even when their existing provider, auth,
model, and configuration gates passed.
## What changed
- Keep hosted Responses tools top-level in `code_mode_only` sessions
after their existing gates pass.
- Preserve the existing nested-tool behavior for ordinary runtimes and
the direct-only behavior for multi-agent v2 tools.
- Add planner coverage for `code_mode_only` with default multi-agent v2
settings, hosted live web search, and hosted image generation.
## Verification
- Added focused regression coverage in
`codex-rs/core/src/tools/spec_plan_tests.rs`.
- Left execution to CI per repository workflow.
## Summary
- Splits the monolithic `codex-cloud-config` implementation into focused
modules.
- Keeps behavior unchanged from the preceding config bundle runtime
switch.
## Details
This is the reviewability follow-up after the lineage-preserving
migration PRs. The split separates backend transport, loader
construction, cache handling, metrics, validation, service
orchestration, and focused tests into named files.
Verification: `just fmt`; `just test -p codex-cloud-config`.
## Why
Guardian review turns already submit a read-only `PermissionProfile`,
which is the permissions model the runtime should honor. Passing the
equivalent legacy `SandboxPolicy` through `ThreadSettingsOverrides`
keeps two representations of the same read-only constraint alive on this
path and makes the guardian flow depend on compatibility plumbing that
is being phased out.
## What Changed
- Set `sandbox_policy` to `None` when the guardian review session
submits its child `Op::UserInput`.
- Keep `permission_profile: Some(PermissionProfile::read_only())` and
`approval_policy: Some(AskForApproval::Never)`, so the guardian review
remains read-only and cannot request approvals.
- Remove the now-unused `SandboxPolicy` import and redundant comment
from `codex-rs/core/src/guardian/review_session.rs`.
## Verification
Not run locally; this is a narrow cleanup of redundant thread-settings
override state.
## Summary
- update the app-server image generation integration test to use
`TestAppServer`
- completes the test helper rename from #25701 for this newer test file
## Validation
- `cargo fmt -- --config imports_granularity=Item`
- `cargo check -p codex-app-server --test all`
Note: `just fmt` ran Rust formatting but failed on Python/SDK formatting
because the sandbox could not access the local `uv` cache.
## Summary
- Adapts the moved `codex-cloud-config` crate from the legacy cloud
requirements endpoint to the new config bundle endpoint.
- Switches runtime consumers from `CloudRequirementsLoader` to
`CloudConfigBundleLoader` so one shared bundle supplies cloud-delivered
config and requirements.
- Removes the legacy cloud requirements domain loader path.
## Details
This intentionally keeps `codex-cloud-config` monolithic for review
lineage: the previous PR establishes the crate move, and this PR shows
the behavior change against that moved implementation. A follow-up PR
splits the module back into focused files.
The new bundle path preserves the important cloud requirements loader
semantics where intended: account-scoped signed cache, 30 minute TTL, 5
minute refresh cadence, retry/backoff, auth recovery, and fail-closed
startup loading. The cached payload changes from a single requirements
TOML string to the backend-delivered bundle, and validation rejects
malformed config or requirements fragments before cache write/use.
## Summary
- carry `workspace_kind` from Responses API client metadata into the
turn resolved analytics fact
- serialize the optional value on `codex_turn_event`
- cover both the turn metadata source and turn event serialization
The `workspace_kind` tells us whether a thread had a project attached vs
projectless. this is an indicator for who is adopting Codex for
knowledge work outside of coding
## Testing
- `env UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just fmt`
- `env PATH=/private/tmp/cargo-tools/bin:$PATH
CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just test -p codex-analytics`
- `env PATH=/private/tmp/cargo-tools/bin:$PATH
CARGO_HOME=/private/tmp/cargo-home UV_CACHE_DIR=/private/tmp/uv-cache
/private/tmp/cargo-tools/bin/just test -p codex-core turn_metadata`
Paired with openai/openai#970661, which keeps forwarding the same
metadata key through Responses API headers.
## Why
Fixes#24944.
On Windows, app-server resume could reject an active running thread when
the requested session path used normal `C:\...` form and the
already-running path used verbatim `\\?\C:\...` form. The paths point at
the same JSONL file, but the resume stale-path guard compared raw
`PathBuf`s, so desktop resume and heartbeat flows could fail with a
mismatched-path error.
## What Changed
- Compare requested and active rollout paths with
`path_utils::paths_match_after_normalization`.
- Extend the existing running-thread mismatched-path test with a
Windows-only same-file resume case before the stale-path rejection.
## Verification
- `just test -p codex-app-server
thread_resume_rejects_mismatched_path_for_running_thread_id`
## Summary
- Move Azure Trusted Signing values out of reusable workflow-call
secrets and into the `azure-artifact-signing` environment scope
- Attach the Windows signing job to the `azure-artifact-signing`
environment so it can resolve the signing secrets directly
- Stop inheriting caller secrets for the Windows release reusable
workflow
## Validation
- `git diff --check -- .github/workflows/rust-release.yml
.github/workflows/rust-release-windows.yml`
- `ruby -e 'require "yaml"; ARGV.each { |path| YAML.load_file(path);
puts "ok #{path}" }' .github/workflows/rust-release.yml
.github/workflows/rust-release-windows.yml`
## Summary
- pin the Python SDK runtime package to `openai-codex-cli-bin==0.136.0`
so Ubuntu/glibc installs resolve a compatible wheel
- refresh generated SDK artifacts and lock data for the runtime update
- keep newly generated client-message-id wire models internal to the
generated protocol layer
## Dependency
- merge #25906 first so the Python SDK release publishes both manylinux
and musllinux runtime wheels before publishing the package with this pin
## Validation
- ran `just fmt`
- regenerated the Python public API helpers
- validated the edited workflow YAML
- CI passed 29/29 checks
## Stack
1. #25850 - Key request-permission grants by environment: stores and
applies sticky permission grants per environment id.
2. #25858 - Add `environmentId` to `request_permissions`: lets the model
target a selected environment and resolves relative permission paths
against it.
3. This PR (#25862) - Propagate permission approval environment id:
carries the selected environment id through approval events, app-server
requests, TUI prompts, and delegate forwarding.
4. #25867 - Add remote request permissions integration coverage:
verifies the selected remote environment across request, approval, grant
reuse, and exec.
This PR is stacked on #25858, and #25867 is stacked on this PR.
## Why
PR2 lets the model bind a `request_permissions` call to a selected
environment, but the approval event and client-facing request still
needed to carry that binding. For CCA, the user-facing prompt and
delegated approval path should know which environment the grant applies
to instead of relying on cwd alone.
## What Changed
- Added optional `environmentId` to `RequestPermissionsEvent`.
- Emit the selected environment id from core permission approval events.
- Preserve the environment id through delegate forwarding, including
cwd-based delegated requests.
- Added `environmentId` to app-server permission approval params,
generated schema/TypeScript artifacts, and README examples.
- Preserve and display the environment id in TUI permission approval
prompts.
- Updated focused core, app-server protocol, and TUI conversion
coverage.
## Testing
Not run locally per instruction. Performed read-only `git diff --check`.
## Summary
- Teach the Windows release prebuild staging step to locate Rust/MSVC
PDBs emitted with crate-style underscore names.
- Stage PDBs under the shipped hyphenated binary names so the downstream
symbol archive step keeps the same artifact contract.
- Keep a fallback for already-hyphenated PDB names and fail with a clear
diagnostic if neither form exists.
## Root cause
The recent symbol publishing change in #25649 started copying
`${binary}.pdb` from `target/<triple>/release` during Windows prebuild
staging. Cargo still emits the `.exe` with the hyphenated binary name,
but MSVC PDBs for hyphenated Rust crates are emitted with underscores,
for example `codex_app_server.pdb` for `codex-app-server.exe`. The
release workflow was still building into the expected directory; the new
PDB copy step was looking for the wrong filename.
## Impact
This unblocks the `rust-release` Windows prebuilt-binary jobs for
hyphenated binaries while preserving the hyphenated PDB names consumed
by the final Windows release packaging and symbol archive steps.
## Validation
- `just fmt` from `codex-rs`
- `git diff --check -- .github/workflows/rust-release-windows.yml`
- Parsed `.github/workflows/rust-release-windows.yml` as YAML locally
- Local bash staging sanity test for both underscore-emitted and
hyphenated PDB filenames
## Why
Standalone image-generation extensions emitted turn items through the
low-level event path, bypassing host-owned finalization such as image
persistence and contributor processing. At the same time, the
generated-image save-path hint must remain visible to the model through
the extension tool's `FunctionCallOutput`, rather than the legacy
built-in developer-message path.
## What changed
- Extended `ExtensionTurnItem` to support image-generation items while
keeping the extension-facing emitter API limited to `emit_started` and
`emit_completed`.
- Routed extension completion through core `finalize_turn_item`, so
standalone image-generation items receive host-owned processing and
persisted `saved_path` values before publication.
- Kept legacy built-in image generation on its existing
developer-message hint path, while standalone image generation returns
its deterministic saved-path hint in `FunctionCallOutput`.
- Shared the image artifact path and output-hint formatting used by core
and the image-generation extension.
- Passed thread identity through extension tool calls so standalone
image generation can construct the same intended artifact path as core.
- Added an app-server integration test covering real standalone image
generation, saved artifact publication, model-visible output hint
wiring, and absence of the legacy developer-message hint.
## Validation
- `just fmt`
- `just test -p codex-image-generation-extension`
- `just test -p codex-web-search-extension`
- `just test -p codex-goal-extension`
- `just test -p codex-memories-extension`
- Targeted `codex-core` tests for image save history, extension
completion finalization, and contributor execution
- `just test -p codex-app-server
standalone_image_generation_returns_saved_path_hint_to_model`
- `just fix -p codex-core`
- `just fix -p codex-image-generation-extension`
- `just bazel-lock-update`
- `just bazel-lock-check`
## Stack
1. #25850 - Key request-permission grants by environment: stores and
applies sticky permission grants per environment id.
2. This PR (#25858) - Add `environmentId` to `request_permissions`: lets
the model target a selected environment and resolves relative permission
paths against it.
3. #25862 - Propagate permission approval environment id: carries the
selected environment id through approval events, app-server requests,
TUI prompts, and delegate forwarding.
4. #25867 - Add remote request permissions integration coverage:
verifies the selected remote environment across request, approval, grant
reuse, and exec.
This PR is stacked on #25850; #25862 and #25867 are stacked on this PR.
## Why
PR1 made request-permission grants internally environment-keyed, but the
model-facing `request_permissions` tool could still only target the
primary environment. For CCA and multi-environment turns, the tool needs
an explicit way to bind a permission request to a selected attached
environment before resolving relative paths.
## What Changed
- Added optional `environmentId` to `RequestPermissionsArgs`, with
`environment_id` accepted as an alias.
- Exposed `environmentId` in the `request_permissions` tool schema and
description.
- Resolve the selected environment before parsing filesystem permission
paths, so relative paths bind to the selected environment cwd.
- Route validated tool calls through
`request_permissions_for_environment` directly instead of duplicating
environment lookup in `Session::request_permissions`.
- Reject unknown environment ids with a model-facing error.
- Updated focused request-permissions and Guardian call sites for the
new optional field.
## Testing
Not run locally per instruction.
## Summary
- add analytics-only `CodexErr` telemetry to `codex_turn_event` while
leaving existing `turn_error` unchanged
- record terminal `CodexErr` facts from core immediately before the
existing turn error event is sent
- emit source-truth `codex_error_*` fields for downstream analytics,
including the raw `CodexErr::InvalidRequest(String)` message as
`codex_error_subreason`
## Validation
- `just test -p codex-analytics`
- attempted `just test -p codex-core`, but the local run timed out
across unrelated integration suites in this environment and is not being
used as validation
## Stack
1. This PR (#25850) - Key request-permission grants by environment:
stores and applies sticky permission grants per environment id.
2. #25858 - Add `environmentId` to `request_permissions`: lets the model
target a selected environment and resolves relative permission paths
against it.
3. #25862 - Propagate permission approval environment id: carries the
selected environment id through approval events, app-server requests,
TUI prompts, and delegate forwarding.
4. #25867 - Add remote request permissions integration coverage:
verifies the selected remote environment across request, approval, grant
reuse, and exec.
#25858, #25862, and #25867 are stacked on this PR and should be reviewed
after it.
## Why
Multi-environment CCA turns can attach both local and remote executors,
but request-permission grants were still effectively cwd-only. Pending
permission requests tracked a cwd, while stored turn/session grants had
no environment identity, so sticky grants could be reused through the
wrong executor context.
This makes the first permission-grant step environment-aware without
changing the external `request_permissions` payload shape: omitted
environment targeting remains bound to the primary turn environment.
## What Changed
- Store turn- and session-scoped request-permission grants by
`environment_id`.
- Keep the selected `TurnEnvironmentSelection` with pending
`request_permissions` calls so approval responses normalize and record
grants against the same environment.
- Resolve relative `request_permissions` file paths against the primary
turn environment cwd instead of deprecated `turn.cwd`.
- Apply sticky grants in `shell`, `exec_command`, and `apply_patch` by
selected environment id while still using the actual tool cwd for
cwd-relative permission materialization.
- Update Guardian and request-permissions coverage for the
environment-keyed grant behavior.
## Testing
Not run locally. Added or updated focused coverage for:
- `request_permission_grants_are_environment_keyed`
-
`request_permissions_tool_resolves_relative_paths_against_primary_environment`
- related Guardian/request-permissions sticky grant tests
## Why
PR #25905 intentionally adds a failing `codex-core` unit test, but its
[Bazel test on Windows
check](https://github.com/openai/codex/actions/runs/26837526950/job/79135369259)
passed. That shows the Bazel configuration introduced by #25156 is not
behaving as expected, so revert it while the configuration can be
investigated separately.
## What changed
Revert #25156 in full, restoring the previous Bazel remote
configuration, CI scripts, workflows, `rusty_v8` handling, and
documentation. This removes the shared BuildBuddy wrapper and its tests.
## Validation
Not run locally; this exact revert was prioritized for a fast rollback.
## Why
Permission profiles that extend a built-in profile should behave like
other TOML inheritance: parent entries provide defaults, and child keys
override matching fields before the profile is compiled.
That was not true for `:workspace`. Previously, a profile with `extends
= ":workspace"` seeded the compiled runtime
`PermissionProfile::workspace_write()` policy and then appended child
filesystem entries. A child override such as `":tmpdir" = "read"`
therefore left the inherited `":tmpdir" = "write"` entry in the final
policy. Since same-target `write` wins over `read` during runtime
resolution, the child override was ineffective.
This also needs a clear source of truth for the built-in profiles. The
protocol-level sandbox policy constructors now define the raw built-in
filesystem entries, and both `PermissionProfile` presets and
config-profile inheritance derive from those same values.
## What Changed
- Add a canonical `FileSystemSandboxPolicy::read_only()` constructor
while keeping the read-only and workspace-write raw filesystem entries
explicit and independent.
- Derive `PermissionProfile::read_only()` from
`FileSystemSandboxPolicy::read_only()`;
`PermissionProfile::workspace_write()` continues to derive from
`FileSystemSandboxPolicy::workspace_write()`.
- Build extensible `:read-only` and `:workspace` parent profiles by
projecting those canonical sandbox policies into
`PermissionProfileToml`, then merge user overrides at the TOML layer
before compilation.
- Add config parsing support for `:slash_tmp` so the built-in
`:workspace` parent can be expressed in the same TOML-shaped filesystem
table as user profiles.
- Document that `PermissionsToml::resolve_profile()` returns an
already-merged `PermissionProfileToml`, and return that profile directly
after removing the resolved-profile wrapper.
- Extend the config test for `extends = ":workspace"` to assert that
inherited `":slash_tmp" = "write"` is preserved and that a child
`":tmpdir" = "read"` entry replaces the inherited `write` entry.
## Verification
- `just test -p codex-config`
- `just test -p codex-protocol`
- `just test -p codex-core
permissions_profiles_resolve_extends_parent_first_with_child_overrides`
- `just test -p codex-core
default_permissions_profile_can_extend_builtin_workspace`
- `just test -p codex-core`
- Result: 2596 passed, 4 failed, 1 timed out.
- The failures were existing sandbox/environment-sensitive tests
unrelated to this permissions change:
`suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`,
`suite::user_shell_cmd::user_shell_command_history_is_persisted_and_shared_with_model`,
`suite::abort_tasks::interrupt_persists_turn_aborted_marker_in_next_request`,
`suite::abort_tasks::interrupt_tool_records_history_entries`, and
`thread_manager::tests::start_thread_uses_all_default_environments_from_codex_home`.
## Why
Bazel remote configuration was selected in several CI scripts and
workflow steps. That made the BuildBuddy tenant policy easy to duplicate
and harder to audit, especially for fork pull requests that must not use
the OpenAI tenant.
This builds on
[sluongng/buildbuddy-ci-host-routing](https://github.com/openai/codex/compare/main...sluongng:codex:sluongng/buildbuddy-ci-host-routing)
and consolidates the policy in one place.
## What to do if this breaks you
See `codex-rs/docs/bazel.md` for details. TLDR:
1. make a BuildBuddy API key and put it in `~/.bazelrc`
2. if you're an OpenAI employee, add `common
--config=buildbuddy-openai-rbe` to `user.bazelrc` in the repo root
Run `just bazel-test` to ensure it works.
Note that `just bazel-remote-test` no longer exists, you need to select
a remote configuration as documented to use RBE.
## What changed
- Add `.github/scripts/run_bazel_with_buildbuddy.py` as the shared Bazel
wrapper and Python library. It selects the OpenAI host only for trusted
upstream GitHub Actions runs, routes keyed fork runs to the generic
host, and falls back to local Bazel execution when no key is available.
- Move endpoint selection into explicit `.bazelrc` configurations and
update Bazel CI, query helpers, and `rusty_v8` staging to use the shared
policy. Loading-phase target-discovery queries remain local.
- Add wrapper and `rusty_v8` unit coverage, plus `just test-scripts` for
the `.github/scripts` Python tests.
- Document local Bazel usage, `user.bazelrc` setup, BuildBuddy
configurations, and CI behavior in `codex-rs/docs/bazel.md`.
## Validation
- `just test-scripts`
- `bash -n .github/scripts/run-bazel-ci.sh
.github/scripts/run-bazel-query-ci.sh
.github/scripts/run-argument-comment-lint-bazel.sh
scripts/list-bazel-clippy-targets.sh`
- `python3 -m py_compile .github/scripts/run_bazel_with_buildbuddy.py
.github/scripts/test_run_bazel_with_buildbuddy.py
.github/scripts/test_rusty_v8_bazel.py
.github/scripts/rusty_v8_bazel.py`
- `ruff check .github/scripts/run_bazel_with_buildbuddy.py
.github/scripts/test_run_bazel_with_buildbuddy.py
.github/scripts/test_rusty_v8_bazel.py
.github/scripts/rusty_v8_bazel.py`
## Summary
- skip startup websocket prewarm setup when the model client has
Responses-over-WebSocket disabled
- avoid making HTTP-only sessions build prewarm prompt/tool state that
cannot produce a reusable websocket session
## Why
Recent macOS timing flakes were timing out while waiting for first-turn
events in HTTP-only core tests. Startup prewarm is only useful for
websocket-capable providers, but it was scheduled for every session. For
HTTP-only test providers this added unnecessary async startup work
before the regular turn could reach the mocked response flow.
## Testing
- bazel test //codex-rs/core:core-all-test
--test_filter=suite::auto_review::remote_model_override_uses_catalog_model_for_strict_auto_review
--test_output=errors
- bazel test //codex-rs/core:core-all-test
--test_filter=suite::request_permissions_tool::approved_folder_write_request_permissions_unblocks_later_apply_patch
--test_output=errors
## Why
Cargo's libgit2 transport has intermittently failed while fetching git
dependencies with nested submodules.
[#25644](https://github.com/openai/codex/pull/25644) applied
`CARGO_NET_GIT_FETCH_WITH_CLI=true` to the main Rust release build after
macOS SecureTransport/libgit2 failures while cloning `libwebrtc`'s
nested `libyuv` submodule. Similar flakes can affect other Cargo-bearing
Rust jobs.
## What changed
Configure `CARGO_NET_GIT_FETCH_WITH_CLI=true` at workflow scope for the
remaining Cargo-bearing Rust workflows:
- fast Rust CI and `cargo-deny`
- reusable Windows and argument-comment-lint release workflows
- `rusty-v8-release` and `v8-canary` Cargo builds and smoke tests
The full Rust CI, reusable nextest workflow, and primary Rust release
build already had the override. Bazel-only workflows are unchanged
because they use a different dependency fetch path.
## Validation
- Parsed all `.github/workflows/*.yml` files as YAML.
- Scanned Cargo-bearing workflows to confirm they configure
`CARGO_NET_GIT_FETCH_WITH_CLI`.
## Why
`Runtime::block_on` executes the top-level future on the caller's OS
thread, not on one of Tokio's worker threads. That matters for the
interactive CLI because the Tokio runtime already configures larger
worker stacks, while the process main thread can still have a smaller
platform default stack.
This showed up as a `/clear` crash on macOS: starting a fresh TUI thread
reloads config, and the stack-heavy TOML deserialization path can
overflow before the new session is actually started.
## What Changed
- Run the regular `arg0_dispatch_or_else` async entrypoint on a named
`codex-main` thread.
- Give that thread the same `TOKIO_WORKER_STACK_SIZE_BYTES` stack budget
already used for Tokio worker threads.
- Keep `Arg0DispatchPaths` and the arg0 alias guard lifetime behavior
the same.
- Resume panics from the spawned main thread so panic behavior is
preserved.
## Verification
- `cargo check -p codex-cli` currently fails because the top-level
CLI/TUI future is not `Send` under the new thread boundary.
## Summary
Keep the full `TestCodex` harness alive in plugin integration tests
instead of returning only the `CodexThread`.
## Why
The helper was moving a temporary `codex_home` into `TestCodex`, then
immediately dropping the harness and returning only the thread. For
plugin MCP tests, the MCP server cwd is inside that temporary home. If
the temp directory is removed while MCP startup is still racing, the
server launch can fail with `No such file or directory`.
Keeping the harness in scope keeps the temp home alive for the test
duration and removes the lifetime race behind the recent
`explicit_plugin_mentions_inject_plugin_guidance` flake.
## Validation
- `just fmt`
- `just test -p codex-core
explicit_plugin_mentions_inject_plugin_guidance`
## Why
`/clear` starts a fresh thread with `InitialHistory::Cleared`, which
re-enters the thread/session startup path. That path now builds large
async futures through `ThreadManagerState::spawn_thread_with_source`,
`Codex::spawn`, and `Session::new`. Separately, TUI config rebuilds for
cwd and permission-profile changes build a similarly heavy
`ConfigBuilder::build()` future inside the app task. In debug and Bazel
runs, those call chains can put enough state on the caller stack to
abort before startup or config refresh completes.
This change keeps the behavior the same while moving the heaviest future
frames off the caller stack.
## What changed
- Box `Codex::spawn(...)` in `codex-rs/core/src/thread_manager.rs`
before awaiting it from `spawn_thread_with_source`.
- Box `Session::new(...)` in `codex-rs/core/src/session/mod.rs` before
awaiting it from `Codex::spawn_internal`.
- Route `ConfigBuilder::build()` through a small `tokio::spawn` helper
in `codex-rs/tui/src/app/config_persistence.rs` so cwd and
permission-profile config rebuilds run on a runtime worker stack while
preserving error context.
## Verification
CI is running on the PR.
No new targeted tests were added. This is a mechanical stack-pressure
reduction that keeps the existing behavior and error propagation intact.
Stack split from #25708. Original PR intentionally left open. This fifth
PR adds coverage that a remotely selected multi-agent runtime is applied
when the model is selected before the first turn.
Stack split from #25708. Original PR intentionally left open. This
fourth PR adds coverage that remote model multi-agent runtime selectors
override local feature flag defaults.
## Why
Follow-up to #25722. Startup prewarm builds a preview `TurnContext`
before the first real turn so it can precompute the initial prompt and
tool surface. After the per-thread runtime work landed, that preview
path still recomputed multi-agent mode from `model_info` and feature
defaults instead of reusing the runtime the session had already resolved
from persisted metadata or inheritance.
That could leave the prewarmed session primed for a different
multi-agent mode than the first real turn, which is especially risky
because collaboration tool exposure depends on
`turn_context.multi_agent_version`.
## What changed
- In the `TurnMultiAgentRuntime::Preview` path, prefer
`Session::multi_agent_version()` when it is already known.
- Only fall back to `model_info.multi_agent_version` and feature
defaults when the session has not resolved a runtime yet.
- Keep preview mode read-only: this still avoids storing a runtime
during startup prewarm.
## Testing
- Not run (small runtime-selection follow-up)
Stack split from #25708. Original PR intentionally left open. This third
PR resolves the effective per-thread multi-agent runtime from persisted
metadata, inherited runtime, and current model selection.
Stack split from #25708. Original PR intentionally left open. This
second PR persists multi-agent runtime metadata through thread creation,
rollout recording, and thread storage.
Stack split from #25708. Original PR intentionally left open. This first
PR adds the multi-agent runtime metadata types and catalog plumbing used
by the rest of the stack.
## Summary
- teach rollout search to return precomputed snippets for compressed
rollouts
- reuse those snippets in local thread search instead of reopening
matching compressed files
- keep the no-`rg` fallback single-pass and add regression coverage for
the compressed path
## Why
`thread/search` currently decodes matching compressed rollouts twice:
once to discover the matching path and again to extract the snippet
shown in results. That defeats a meaningful part of the compressed-read
optimization work.
## Impact
Compressed rollout hits now pay one decode pass on the search path while
plain `.jsonl` hits keep the existing ripgrep-driven flow.
## Validation
- `just test -p codex-rollout`
- `just test -p codex-thread-store`
- `just fix -p codex-rollout`
- `just fix -p codex-thread-store`
- `just fmt`
## Summary
- Validate skill base name length before plugin namespacing.
- Bound the composed `plugin:skill` qualified name to 128 characters.
- Keep plugin skill runtime names in the existing `plugin:skill` form.
- Add regression tests for the max qualified-name boundary and rejection
path.
## Root Cause
Plugin skills are represented as `plugin_name:skill_name`, but the
loader previously applied the 64-character skill name limit after adding
the plugin namespace. Moving that check to the base name fixes valid
plugin skills with longer namespaces, and the separate 128-character
qualified-name limit keeps model-visible skill names bounded.
## Validation
- `just fmt`
- `just test -p codex-core-skills plugin_skill_name_length_limit`
- `git diff --check`
## Summary
- Move plugin discoverable recommendation filtering from `codex-core`
into `codex-core-plugins` behind `ToolSuggestPluginDiscoveryInput`.
- Keep `codex-core` as a thin adapter from `Config` to the core-plugins
API and back to `DiscoverablePluginInfo`.
- Keep the existing discoverable allowlist private to the core-plugins
implementation.
## Validation
- `just fmt`
- `just test -p codex-core list_tool_suggest_discoverable_plugins`
- `git diff --check`
- Read-only subagent review: no findings
## Summary
- cache the global remote plugin catalog when remote plugin listing runs
and warm it during startup
- use the cached remote catalog in plugin install recommendations with
canonical `plugin@openai-curated-remote` ids
- reuse the session `PluginsManager` for plugin recommendations so
remote cache state is visible on the recommend path
- skip core installed-state verification for remote plugin install
suggestions while leaving local plugin and connector verification
unchanged
## Testing
- `just fmt`
- `git diff --check`
- `cargo test -p codex-core
list_tool_suggest_discoverable_plugins_includes_cached_remote_global_plugins`
- `cargo test -p codex-core
remote_plugin_install_suggestions_skip_core_installed_verification`
- `cargo test -p codex-app-server
plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled`
Earlier focused checks during the same branch: codex-tools TUI filter
test, request_plugin_install tests, and codex-app-server build.
## Summary
- add `--json` output to `codex plugin list` with `installed` and
`available` arrays
- add `--available` for JSON output only; using it without `--json` is
rejected
- keep the existing non-JSON table output unchanged
- add CLI coverage for JSON installed/available output and the
`--available`/`--json` requirement
## Validation
- `just test -p codex-cli plugin_list`
- `just fix -p codex-cli`
- `git diff --check`
Note: `just fmt` ran Rust formatting first, then failed in the Python
ruff step because `openai-codex-cli-bin==0.132.0` has no wheel for this
Linux platform.
## Summary
Enterprise users can have an effective monthly credit limit, but Codex
`/status` currently drops that metadata from the account-usage response.
This change adds the optional `spend_control.individual_limit`
projection to the existing rate-limit snapshot flow. The backend client
reads the monthly limit, app-server exposes it as `individualLimit`, and
the TUI renders a `Monthly credit limit` row through the existing
progress-bar renderer.
When the backend does not return an effective monthly limit, existing
rate-limit behavior is unchanged.
## Existing backend state
The account-usage backend already returns the effective monthly limit
and current usage together:
```json
{
"spend_control": {
"reached": false,
"individual_limit": {
"limit": "25000",
"used": "8000",
"remaining": "17000",
"used_percent": 32,
"remaining_percent": 68,
"reset_after_seconds": 86400,
"reset_at": 1778137680
}
}
}
```
Before this change, Codex projected rolling `primary` and `secondary`
windows plus `credits`. It ignored `spend_control.individual_limit`, so
app-server clients and `/status` could not render the monthly cap.
The updated flow is:
```text
account usage backend
-> backend-client reads spend_control.individual_limit
-> existing rate-limit snapshot carries optional individual_limit
-> app-server exposes optional individualLimit
-> TUI renders Monthly credit limit
```
## App-server contract
`account/rateLimits/read` and sparse `account/rateLimits/updated`
notifications now include an additive nullable
`rateLimits.individualLimit` field:
```json
{
"individualLimit": {
"limit": "25000",
"used": "8000",
"remainingPercent": 68,
"resetsAt": 1778137680
}
}
```
In an `account/rateLimits/read` response, `null` means no monthly limit
is available. `account/rateLimits/updated` remains a sparse rolling
notification: clients merge available values into their most recent
`account/rateLimits/read` snapshot or refetch. Nullable account metadata
in a rolling notification does not clear a previously observed value.
## Design decisions
- Extend the existing rate-limit snapshot instead of introducing a
separate request or wire-level update protocol.
- Keep the Codex projection narrow: `/status` needs the effective limit,
current usage, remaining percentage, and reset timestamp.
- Render the monthly row through the existing progress-bar renderer,
with one optional detail line for `8,000 of 25,000 credits used`.
- Keep the backend response optional so existing accounts and older
usage states preserve their current behavior.
- Preserve cached monthly metadata when sparse rolling notifications
omit it. Live account-usage reads remain authoritative and can clear a
removed limit.
## Visual evidence
```text
Monthly credit limit: [██████████████░░░░░░] 68% left (resets 07:08 on 7 May)
8,000 of 25,000 credits used
```
Snapshot:
`codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap`
## Testing
Tests: generated app-server schema verification, protocol tests,
backend-client tests, app-server integration coverage, TUI snapshot
coverage, formatting, and workspace lint cleanup.
## Why
Codex Review now supports repository-specific review rules in AGENTS.md.
Adding the review prompts there makes the guidance available as
repository review rules next to the code it governs while keeping the
existing local review skills intact.
## What changed
- Added a `## Code Review Rules` section to `AGENTS.md` with the
existing review prompts for model context, breaking changes, test
authoring, and change size.
- Preserved the existing `.codex/skills/code-review*` skill files.
## Verification
- `git diff --check origin/main...HEAD`
## Why
The root formatting entrypoints could drift: `just fmt` did not format
the Justfile itself, and the CI-facing check recipe only checked Python
scripts instead of matching everything formatted by `just fmt`.
## What changed
- Add a shared cross-platform Python formatter driver used by both `just
fmt` and `just fmt-check`.
- Run Justfile, Rust, Python SDK, and internal-script formatter groups
concurrently while buffering each formatter group's output until it
finishes.
- Log formatter starts immediately, then print each formatter group's
labeled output when it completes.
- Keep the SDK lint-fix and Ruff formatting passes ordered, with source
comments explaining their distinct roles and the check-mode equivalents.
- Run Ruff through shared `uv run --no-sync --with ruff` overlays so
formatting works on clean glibc Linux checkouts without installing the
platform-specific SDK runtime wheel.
- Show `fmt-check` help text in `just -l` and simplify CI to call the
shared driver through `just fmt-check`.
- Pin the general CI workflow to `just@1.51.0` so its formatter agrees
with the checked-in Justfile.
- Add regression coverage for the thin Just recipes and the driver's
formatter graph.
## Validation
- `just fmt`
- `just fmt-check`
- `python3 -m pytest
sdk/python/tests/test_artifact_workflow_and_binaries.py -k 'root_fmt or
root_format' -q`
- `pnpm run format`
- `git diff --check`
- `just -l | rg -n '^ fmt|fmt-check'`
- `uvx --from uv==0.7.22 uv run --frozen --project sdk/python --no-sync
--with ruff ruff check --diff sdk/python`
## Why
Remote control enrollment authorizes a desktop server, but app-server v2
did not expose the follow-up pairing operation needed to mint a
short-lived controller pairing artifact from that enrolled server.
Clients need a narrow RPC that starts pairing without exposing the
backend `serverId` or conflating pairing with websocket connection
state.
Issue: N/A; internal remote-control pairing API change.
## What Changed
Added experimental app-server v2 `remoteControl/pairing/start` with
`manualCode` input and `pairingCode`, nullable `manualPairingCode`,
`environmentId`, and Unix-seconds `expiresAt` output. The method
serializes under its own `global("remote-control-pairing")` scope and is
documented in `app-server/README.md`.
Extended the remote-control transport with private `/server/pair`
request/response types and normalized `pair_url` handling. Pairing uses
the current enrolled server bearer, refreshes that bearer when needed,
keeps backend `server_id` private, validates returned `server_id` and
`environment_id` against the current enrollment, and preserves backend
status/header/body context for failures and malformed responses.
Wired the request through `RemoteControlRequestProcessor` and
`MessageProcessor`, mapping unavailable/disabled pairing to
`invalid_request` and backend failures to internal errors.
## Verification
- `just test -p codex-app-server-transport`
- `just test -p codex-app-server
remote_control_pairing_start_returns_pairing_artifacts`
## Summary
- Moves the existing `codex-cloud-requirements` crate to
`codex-cloud-config`.
- Updates workspace dependencies and imports to the new crate name.
- Intentionally keeps runtime behavior unchanged: this still fetches the
legacy cloud requirements endpoint.
## Details
This PR exists to make the lineage obvious before the bundle migration.
GitHub should show the old `codex-rs/cloud-requirements/src/lib.rs`
implementation as moved to `codex-rs/cloud-config/src/lib.rs`, rather
than as unrelated new code.
The follow-up PR adapts this moved crate to the new config bundle API
and switches runtime consumers over.
## Summary
Remove the dead experimental `persistExtendedHistory` app-server flag
and collapse rollout persistence to the single policy app-server already
used.
## What Changed
- Removed `persistExtendedHistory` from v2 thread start/resume/fork
params and deleted its deprecation notice path.
- Removed the persistence-mode enums and plumbing through core, rollout,
and thread-store.
- Made rollout filtering mode-free, keeping the existing limited
persisted-history behavior.
## Test Plan
- `just write-app-server-schema`
- `cargo nextest run --no-fail-fast -p codex-app-server-protocol
schema_fixtures`
- `cargo nextest run --no-fail-fast -p codex-app-server
thread_shell_command_history_responses_exclude_persisted_command_executions`
- `cargo nextest run --no-fail-fast -p codex-rollout -p
codex-thread-store`
- final `rg` for removed flag/type names
## Stack
1. Parent PR: #18240 uses named MITM permissions config.
2. This PR wires managed MITM CA trust into spawned child processes.
## Why
When Codex terminates HTTPS for limited mode or MITM hooks, child HTTPS
clients need to trust Codex's managed MITM CA. Exporting proxy URLs
alone is not enough, but blindly replacing user CA settings would be
wrong: it can break custom enterprise/test roots, leak unreadable CA
files into generated bundles, or make the child env disagree with its
sandbox policy.
## Summary
1. Build immutable managed CA bundles under `$CODEX_HOME/proxy` that
include native roots, the managed MITM CA, and only inherited or
command-scoped CA bundles the child is allowed to read.
2. Export curated CA env vars alongside managed proxy env vars while
preserving user CA override semantics, including nested Codex
`SSL_CERT_FILE` precedence.
3. Thread generated CA bundle paths into child sandbox readable roots,
including debug sandbox execution, so the exported env vars work inside
sandboxed commands.
4. Remove only Codex-generated MITM CA bundle env when a child
intentionally drops managed proxying for escalation or no-proxy retry.
5. Document the managed CA bundle behavior and cover env injection,
per-child bundle generation, sandbox readable roots, and no-proxy
cleanup in tests.
## Validation
1. Ran `just test -p codex-network-proxy`.
2. Ran `just test -p codex-protocol`.
3. Ran `just fix -p codex-network-proxy -p codex-protocol`.
4. Tried focused `codex-core` validation, but the crate currently fails
to compile in `core/tests/suite/guardian_review.rs` because an existing
`Op::UserInput` initializer is missing `additional_context`.
---------
Co-authored-by: Eva Wong <evawong@openai.com>
## Why
Fixesopenai/codex#20944.
Desktop side chats are intentionally ephemeral and pathless. They can
still accept live turns while loaded, but after a reload there is no
persisted rollout to resume. In the reported failure mode, Desktop could
send `$CODEX_HOME` as the resume/fork path for one of these pathless
side chats.
`thread/resume` and `thread/fork` prefer an explicit `path` over
`threadId`, and rollout path lookup only checked that a candidate
existed. That let `$CODEX_HOME` pass as a rollout path, so the later
rollout reader tried to open a directory and surfaced the low-level `Is
a directory` error.
## What Changed
- Reject explicit rollout paths that resolve to a directory or other
non-file before attempting to read rollout history.
- Make `codex_rollout::existing_rollout_path` return only plain or
compressed rollout candidates that are actual files.
- Add an app-server regression test that creates an ephemeral fork, runs
a turn while the side thread is loaded, simulates reload, then verifies
both `thread/resume` and `thread/fork` reject `$CODEX_HOME` with `path
is a directory` instead of the OS-level directory-read error.
- Rebase over the `TestAppServer` rename and update the remaining stale
test harness call sites to use `TestAppServer` with `app_server` local
variables.
Relevant code:
- `thread-store/src/local/read_thread.rs` validates explicit rollout
paths before rollout reading:
25b47c8f42/codex-rs/thread-store/src/local/read_thread.rs (L146-L165)
- `rollout/src/compression.rs` now requires file metadata for plain and
compressed rollout candidates:
25b47c8f42/codex-rs/rollout/src/compression.rs (L940-L950)
- The repro test covers the pathless ephemeral side-chat reload case:
25b47c8f42/codex-rs/app-server/tests/suite/v2/thread_fork.rs (L774-L886)
## Verification
- `just test -p codex-app-server
pathless_ephemeral_thread_rejects_codex_home_path_after_reload`
## Why
Production Codex binaries are stripped for distribution, which leaves
crashes and samples from released builds without the symbols needed for
useful stack traces. Publish symbols as separate release assets so
production artifacts stay small while released builds remain
symbolicateable.
## What changed
- Add `.github/scripts/archive-release-symbols-and-strip-binaries.sh` to
package platform-native symbols into `codex-symbols-<artifact>.tar.gz`
assets while stripping the corresponding Unix binaries before signing.
- Build release binaries with full debug information before producing
distribution artifacts.
- Publish macOS `.dSYM` bundles, Linux `.debug` files with
`.gnu_debuglink`, and Windows `.pdb` files.
- Strip Linux `bwrap` before computing its packaged-resource digest, but
intentionally omit `bwrap` from symbol archives.
- Preserve symbols artifacts in the unsigned macOS promotion flow.
## Verification
- Ran `shellcheck` and `bash -n` on
`.github/scripts/archive-release-symbols-and-strip-binaries.sh`.
- Parsed the modified workflow YAML files and ran `git diff --check`.
- Built a macOS release smoke binary and verified that the archived
`.dSYM` contains DWARF application source information and has the same
UUID as the stripped production binary.
- Built Linux smoke binaries and verified that the symbol archive
contains `codex.debug`, excludes `bwrap.debug`, leaves the expected
`.gnu_debuglink` in `codex`, and does not mutate the separately stripped
`bwrap` digest.
- Staged a Windows smoke archive and verified that it contains the
expected `.pdb` file.
## Why
The TUI shortcut overlay used static labels for `Tab` and `Ctrl+C`, even
though both keys change behavior while a task is running. That made the
visible help misleading: idle `Tab` submits rather than queues, and
active-turn `Ctrl+C` interrupts rather than exits.
Closes#25531.
Closes#25564.
## What Changed
- Pass task-running state into the shortcut overlay renderer.
- Render `Tab` as `submit message` while idle and `queue message` while
work is running.
- Render `Ctrl+C` as `exit` while idle and `interrupt` while work is
running.
- Add snapshot coverage for the active-work shortcut overlay and update
idle overlay snapshots.
## How to Test
1. Start Codex and open the shortcut overlay with `?` while no task is
running.
2. Confirm the overlay shows `tab to submit message` and `ctrl + c to
exit`.
3. Start a task, then open or keep the shortcut overlay visible while
work is running.
4. Confirm the overlay shows `tab to queue message` and `ctrl + c to
interrupt`.
5. Type a follow-up prompt during active work and press `Tab`; confirm
it queues rather than submitting immediately.
Targeted tests:
- `just test -p codex-tui footer_snapshots`
- `just test -p codex-tui footer_mode_snapshots`
## Validation Notes
`just test -p codex-tui` currently has two unrelated guardian
feature-flag test failures on this base:
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
`just argument-comment-lint codex-rs/tui/src/bottom_pane/footer.rs`
could not run locally because the prebuilt wrapper requires `dotslash`;
the touched Rust diff was manually inspected for opaque positional
literals.
Deferred tools need to be searchable even when they are not implemented
inside `codex-core`. Extension-provided tools can be registered for
later discovery, but the search metadata path was still owned by
core-specific runtime hooks, which meant the shared `ToolExecutor`
abstraction could not describe how a deferred extension tool should
appear in `tool_search`.
## Changes
- Move `ToolSearchEntry` and `ToolSearchInfo` into `codex-tools` and
re-export them from the shared tools crate.
- Add a default `ToolExecutor::search_info` implementation that derives
loadable tool-search metadata from function and namespace specs.
- Forward search metadata through extension adapters and exposure
overrides while keeping custom search text/source metadata for dynamic,
MCP, and multi-agent tools.
- Remove the old core-local `tool_search_entry` module now that search
metadata lives with the shared executor APIs.
## Testing
- Added `deferred_extension_tools_are_discoverable_with_tool_search`
coverage in `core/src/tools/spec_plan_tests.rs`.
## Why
#25701 renamed the app-server test harness to `TestAppServer`, but it
raced with #25681, which added a new `plugin_list` test call site still
using the old `McpProcess` name. Once both changes met on `main`,
app-server test builds failed before running the suite because
`McpProcess` no longer exists in that scope.
This PR fixes that CI break by updating the remaining stale call site to
the renamed helper.
## What Changed
- Replaced the `McpProcess::new(...)` use in
`codex-rs/app-server/tests/suite/v2/plugin_list.rs` with
`TestAppServer::new(...)`.
- Renamed the local variable from `mcp` to `app_server` at the same call
site to match the helper rename.
Relevant code:
aadd9c999b/codex-rs/app-server/tests/suite/v2/plugin_list.rs (L234-L246)
## Verification
Not run locally; this is a compile fix for the app-server test harness
rename.
## Summary
- opt the extension-backed standalone `web.run` tool into parallel tool
execution
- update the existing extension registration test to assert that the
tool advertises parallel-call support
## Why
The standalone web-search API endpoint now supports parallel requests.
The extension executor still inherited the shared serial default,
causing multiple `web.run` calls to acquire the exclusive runtime lock.
## Impact
Models that emit multiple standalone web-search calls can now execute
them concurrently when model-level parallel tool calls are enabled.
## Validation
- `just fmt`
- `just test -p codex-web-search-extension`
- `git diff --check origin/main...HEAD`
This PR brought to you via VS Code rather than Codex...
- opened `codex-rs/app-server/tests/common/mcp_process.rs`
- put the cursor on `McpServer`
- hit `F2` and renamed the symbol to `TestAppServer`
- went to the file tree
- hit enter and renamed `mcp_process.rs` to `test_app_server.rs`
- ran **Save All Files** from the Command Palette
- ran `just fmt`
The End
(Admittedly, most of the local variables for `TestAppServer` are still
named `mcp`, though.)
## Why
Python contributions in this repository should target the declared
Python 3 runtime instead of carrying Python 2 compatibility patterns
forward. When compatibility across Python 3 point releases matters,
contributors need a consistent source of truth for the minimum supported
version.
## What changed
- Added Python development guidance to `AGENTS.md` stating that the
repository uses Python 3+ and should not use the `__future__` module.
- Documented that contributors should check the nearest `pyproject.toml`
`requires-python` field when evaluating Python 3 point-release
compatibility.
## Testing
Not run (guidance-only change).
## Summary
- describe omitted code-mode tools as deferred nested tools instead of
MCP/app tools
- update the prompt-description assertion to match
## Why
Deferred dynamic tools are also callable through `tools` and
discoverable in `ALL_TOOLS`, so the previous MCP/app-specific wording
was too narrow.
## Validation
- `just fmt`
- `just test -p codex-code-mode`
- `git diff --check`
## Why
New unit test modules should follow one consistent layout so
implementation files stay focused and test suites remain easy to locate,
without creating cleanup churn in existing inline test modules.
## What changed
- Added `AGENTS.md` guidance requiring new test modules to use separate
sibling `*_tests.rs` files with an explicit `#[path = "..._tests.rs"]`
attribute.
- Clarified that existing inline `#[cfg(test)] mod tests { ... }`
modules should not be moved solely to follow the new convention.
## Validation
- Ran `git diff --check`.
## Summary
Add counter telemetry for the local rollout compression worker so we can
see when it runs, why it skips, and how individual file/materialization
paths resolve.
## Changes
- Emit `codex.rollout_compression.run` with statuses for start,
completion, failure, duplicate-run skip, and missing runtime skip.
- Emit `codex.rollout_compression.file` outcomes for scanned,
compressed, skipped, and failed compression candidates.
- Emit `codex.rollout_compression.temp_cleanup` and
`codex.rollout_compression.materialize` counters for cleanup and
decompression paths.
## Validation
- `just fmt`
- `just test -p codex-rollout`
- `just fix -p codex-rollout`
## Why
When unified exec is configured to launch through the zsh fork, local
commands should not let the model override the shell binary with the
`shell` parameter. The configured zsh fork is the mechanism that makes
`execv(2)` interception reliable, so exposing `shell` for local zsh-fork
execution would create a confusing API surface and undermine the
composition.
Remote environments are different: zsh-fork interception is local-only,
so remote unified-exec calls must keep direct unified-exec behavior and
still expose `shell` when a remote environment can be selected.
## What Changed
- Taught the `exec_command` schema builder to omit the `shell` parameter
when requested.
- Hid `shell` from the unified-exec tool schema only when zsh-fork
unified exec applies to all selectable environments.
- Kept `shell` visible when any remote environment can be targeted,
because those calls run through direct unified exec.
- Made unified exec choose the effective shell mode per selected
environment: local environments keep zsh-fork mode, remote environments
use direct mode.
- Left direct unified-exec behavior unchanged, including support for
model-specified shells there.
## Verification
- Added schema coverage showing `exec_command` can hide `shell`.
- Added planner coverage showing zsh-fork unified exec hides `shell` for
local-only execution while direct unified exec still exposes it.
- Added planner coverage showing `shell` remains visible when a remote
environment is available.
- Added handler coverage showing remote environments use direct
unified-exec shell mode instead of zsh-fork mode.
- Ran the focused `codex-core` shell-parameter and zsh-fork tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24980).
* #24982
* #24981
* __->__ #24980
## Why
`shell_zsh_fork` and unified exec need to remain independently
controllable for enterprise rollouts, but we also need a third mode that
composes them. That composed mode is intended to preserve unified exec
command lifecycle support while letting the zsh fork provide more
accurate `execv(2)` interception.
Enabling `unified_exec_zsh_fork` by itself is intentionally not
sufficient. It is a composition gate, not a dependency-enabling
shortcut:
- `unified_exec` selects the PTY-backed unified exec tool.
- `shell_zsh_fork` opts into the zsh fork backend.
- `unified_exec_zsh_fork` only allows those two already-enabled modes to
be composed so local zsh unified exec commands can launch through the
zsh fork.
This separation is deliberate. Enterprises and staged rollouts must be
able to enable or disable unified exec and zsh-fork independently. If
`unified_exec_zsh_fork` implied either dependency, then enabling one
under-development composition flag would silently activate a shell
backend that the configured feature set left disabled.
This PR introduces only the configuration and planning gate for that
composition. Existing `shell_zsh_fork` behavior continues to use the
standalone shell tool unless the new composition feature is explicitly
enabled alongside both dependencies.
## What Changed
- Added the under-development feature flag `unified_exec_zsh_fork`.
- Added `UnifiedExecFeatureMode` so the three input feature flags
collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
planning.
- Updated tool selection so zsh-fork composition requires
`unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
- Kept the existing standalone zsh-fork shell tool behavior when only
`shell_zsh_fork` is enabled.
- Updated config schema output for the new feature flag.
## Verification
- Added feature and tool-config coverage for the new gate.
- Added planner coverage proving `shell_zsh_fork` remains standalone
until composition is explicitly enabled.
- Ran focused tests for `codex-features`, `codex-tools`, and the
affected `codex-core` planner case.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
* #24982
* #24981
* #24980
* __->__ #24979
## Summary
- add executor filesystem canonicalization as a bound-path operation
- route remote canonicalization through the exec-server filesystem RPC
surface
- keep path normalization attached to the filesystem that owns the path
## Stack
- 2/5 in the skills path authority stack extracted from
https://github.com/openai/codex/pull/25098
- follows merged https://github.com/openai/codex/pull/25121
## Validation
- `cd
/Users/starr/code/codex-worktrees/pr-25098-restack-review-pr1b/codex-rs
&& just fmt`
- Not run: tests/checks (not requested)
- GitHub CI pending on rewritten head
## Why
Guardian auto-review normally uses the provider-preferred review model
when one is available. Some parent models need model-catalog metadata to
select a different review model while keeping older `/models` payloads
compatible when that metadata is absent.
## What changed
- Added optional `ModelInfo::auto_review_model_override` metadata to the
public model payload as a review-model slug.
- Updated Guardian review model selection to prefer the catalog override
when present, while preserving the existing provider preferred-model
path and parent-model fallback when it is omitted.
- Added focused Guardian coverage for override and no-override model
selection.
- Added an `auto_review` core integration suite test that loads override
metadata from a remote model catalog path and asserts the strict
auto-review `/responses` request uses the catalog-selected review model.
- Updated existing `ModelInfo` fixtures and local catalog constructors
for the new optional field.
## Validation
- `cargo test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `cargo test -p codex-core guardian_review_uses_`
- `cargo test -p codex-core
remote_model_override_uses_catalog_model_for_strict_auto_review --test
all`
- `just fix -p codex-protocol`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`
## Why
Python files under `scripts/` were not covered by the repository
formatting recipe or the CI formatting job, so formatting drift could
merge unnoticed.
## What
- Add a dedicated `scripts/pyproject.toml` and `scripts/uv.lock` so
root-script formatting uses a locked Ruff version.
- Extend `just fmt` to format root Python scripts and add
`fmt-scripts-check` for CI.
- Run `just fmt-scripts-check` from `.github/workflows/ci.yml`,
installing `uv` through SHA-pinned `astral-sh/setup-uv` while retaining
the `uv` `0.11.3` pin.
- Apply Ruff formatting to the root Python scripts, including
`scripts/just-shell.py`, and extend
`sdk/python/tests/test_artifact_workflow_and_binaries.py` to cover the
root formatting recipe.
- Update `AGENTS.md` so agents run `just fmt` after code changes
anywhere in the repository.
## Validation
- Extended the existing Python SDK workflow test to assert that `just
fmt` includes root Python scripts.
## Why
[#25089](https://github.com/openai/codex/pull/25089) introduced the
background worker that compresses cold archived rollouts, and
[#25654](https://github.com/openai/codex/pull/25654) made that pass
faster once it starts. But the worker still deleted
`rollout-compression.lock` on successful exit, so the existing six-hour
staleness window only helped with overlapping or crashed workers. Each
new local thread-store initialization could immediately rescan archived
rollouts even if a full pass had just finished.
This change keeps the existing marker around long enough to throttle
redundant reruns. The worker is still best-effort, but it no longer does
repeated startup scans when nothing new is eligible for compression.
## What Changed
- Replace the drop-scoped `CompressionLock` with a
`CompressionRunMarker` that claims the existing
`.tmp/rollout-compression.lock` path and leaves it in place after
success.
- Reuse the existing six-hour staleness window to block both overlapping
starts and immediate reruns, while still letting a stale marker be
reclaimed.
- Update the worker docs and debug logging to describe the new "already
running or recently ran" behavior.
- Extend the rollout compression tests to assert that a successful run
leaves the marker behind and that a fresh marker suppresses a new run.
## Validation
- `just test -p codex-rollout`
## Why
`codex_core` is consistently a bottleneck for incremental builds during
iteration. The simplest fix is to make the crate smaller.
## Summary
`codex-core` owns several reusable prompt renderers and static prompt
assets, which makes the crate harder to split apart.
Rename `codex-review-prompts` to `codex-prompts` and move shared review,
goal, permissions, compaction, realtime, hierarchical AGENTS.md, and
`apply_patch` prompts into it. Move prompt-only tests and update
consumers and `CODEOWNERS`.
## Validation
- `just test -p codex-prompts -p codex-apply-patch`
- `just test -p codex-core prompt_caching`
- Bazel builds for the affected crates
## Summary
Make the root `justfile` usable from Windows without maintaining a
separate Windows copy of most recipes.
The repo recipes previously assumed POSIX shell behavior for things like
variadic argument forwarding (`"$@"`) and stderr redirection
(`2>/dev/null`). That made common workflows such as `just fmt`, `just
test`, and `just log` unreliable from Windows. This PR introduces a
small cross-platform shell adapter so recipes can stay mostly unified
while still expanding the few shell-specific constructs correctly on
macOS/Linux and Windows.
## What Changed
- Add `scripts/just-shell.py` as the configured `just` shell adapter.
- On Unix it invokes `sh -cu`.
- On Windows it invokes `pwsh -CommandWithArgs` so arguments containing
spaces are preserved.
- Add portable recipe placeholders:
- `{args}` expands to `"$@"` on Unix and the equivalent PowerShell
forwarded-args expression on Windows.
- `{stderr-null}` expands to the platform-specific stderr suppression
used by `fmt`.
- Convert most variadic one-line recipes to the unified `{args}` form,
including `codex`, `exec`, `file-search`, `app-server-test-client`,
`fix`, `clippy`, `bench`, `mcp-server-run`, `write-app-server-schema`,
and `argument-comment-lint-from-source`.
- Keep genuinely shell-specific recipes split or Unix-only for now,
including recipes backed by `.sh` scripts or recipes whose bodies are
more than simple command forwarding.
- Add a Windows `just install` path that installs PowerShell via
`winget` when `pwsh` is not available, then runs the same basic Rust
setup steps.
- Update the SDK test that validates the root `fmt` recipe so it
recognizes the new portable stderr placeholder.
## Validation
- `just --summary`
- `just --dry-run fmt`
- `just --dry-run bench-smoke`
- `just --dry-run codex foo "bar binky" baz`
- `just --dry-run write-hooks-schema`
- `just --dry-run bazel-lock-update`
- `just --dry-run argument-comment-lint-from-source -- "foo bar"`
- `git diff --check -- justfile scripts/just-shell.py
sdk/python/tests/test_artifact_workflow_and_binaries.py`
- Verified Windows argv preservation through `scripts/just-shell.py`
with arguments containing spaces.
- `uv run --frozen --project sdk/python --extra dev pytest
sdk/python/tests/test_artifact_workflow_and_binaries.py::test_root_fmt_recipe_formats_rust_and_python_sdk`
## Summary
- Preserve app declaration order when loading plugin .app.json files.
- Keep plugin connector summaries in plugin app order after connector
metadata is merged and filtered.
- Add regression coverage for .app.json order and connector summary
order.
## Validation
- just fmt
- just test -p codex-chatgpt
connectors_for_plugin_apps_returns_only_requested_plugin_apps
- just test -p codex-core-plugins
effective_apps_preserves_app_config_order
- just fix -p codex-core-plugins (passes with existing clippy
large_enum_variant warning in core-plugins/src/manifest.rs)
- just fix -p codex-chatgpt
- just bazel-lock-update
- just bazel-lock-check
## Summary
Renames the MultiAgentV2 turn-triggering tool from `assign_task` to
`followup_task` so the exposed tool name better describes sending an
additional task to an existing agent.
This updates the tool spec, handler/module names, registry wiring,
default multi-agent v2 usage hints, and tests. Rollout trace
classification keeps accepting legacy `assign_task` events so older
traces still reduce correctly, while docs show the new tool name.
## Test plan
- `just test -p codex-core followup_task`
- `just test -p codex-core -E
'test(multi_agent_feature_selects_one_agent_tool_family) |
test(multi_agent_v2_can_use_configured_tool_namespace) |
test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'`
- `just test -p codex-rollout-trace`
- `just fix -p codex-core`
- `just fix -p codex-rollout-trace`
Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase
because the local environment could not resolve `hatchling>=1.27.0` from
the configured internal registry. A full `just test -p codex-core` also
hit unrelated environment-sensitive integration failures involving
missing spawned test binaries/sandbox behavior; the changed multi-agent
spec/handler tests passed in the filtered runs above.
## Summary
- add public `codex_exec_server::EnvironmentPathRef`
- bind an absolute path to its owning executor filesystem
- keep path operations in the next review slice
## Stack
- 1/5 in the skills path authority stack extracted from
https://github.com/openai/codex/pull/25098
## Validation
- `cd /Users/starr/code/codex-worktrees/pr-25098-restack4/codex-rs &&
just fmt`
- GitHub CI pending on rewritten head
## Why
[#25089](https://github.com/openai/codex/pull/25089) added the
background worker for compressing cold archived rollouts, but the worker
still processed files effectively one at a time: each compression job
was sent to `spawn_blocking` and then awaited before the next file
started. On machines with a backlog of archived rollouts, that makes
catch-up slower than it needs to be even though the actual compression
work already runs off the async runtime.
## What Changed
- Queue rollout compression work in a `JoinSet` while directory
traversal continues.
- Cap the worker at two in-flight compression jobs so it can overlap
compression without turning the background task into unbounded blocking
work.
- Drain pending jobs before returning, including the
`read_dir.next_entry()` error path, so every launched job still
contributes to the final `compressed`, `skipped`, and `failed` stats.
- Treat task join failures the same way as compression failures in the
worker's warning and failure accounting.
## Summary
- Configure the rust-release build job with
`CARGO_NET_GIT_FETCH_WITH_CLI=true`
- Document the macOS SecureTransport/libgit2 failure mode that hit the
`libwebrtc`/`libyuv` git submodule fetch
## Root cause
The release run at
https://github.com/openai/codex/actions/runs/26717498860/job/78745156683
repeatedly failed before compilation because Cargo's libgit2 fetch path
could not clone the nested `yuv-sys/libyuv` submodule from
`chromium.googlesource.com`, ending with `SecureTransport error:
connection closed via error`.
## Validation
- `git diff --check`
This is a workflow-only change, so I did not run Rust package tests.
## Why
Codex 0.135.0 started shipping bundled SQLite 3.51.x via SQLx 0.9.0 to
avoid the older WAL corruption bug fixed by #24728. On Windows x64,
#25367 reports an immediate `STATUS_ILLEGAL_INSTRUCTION` crash on a
Haswell CPU when starting normal Codex paths.
Rather than downgrading SQLite, this keeps the newer bundled SQLite
source and removes SQLite compiler-intrinsic code paths from the Windows
x64 release build.
## What changed
For `x86_64-pc-windows-msvc` release builds, export
`LIBSQLITE3_FLAGS=SQLITE_DISABLE_INTRINSIC` before `cargo build` in:
- `.github/workflows/rust-release.yml`
- `.github/workflows/rust-release-windows.yml`
Other targets keep their current SQLite build flags.
## Verification
- `git diff --check`
## Rollout compression stack
This stack splits #24941 into reviewable steps for local rollout
compression. The design is intentionally staged:
1. Teach readers, listing, search, and lookup to understand compressed
rollouts.
2. Make append and resume paths materialize compressed rollouts back to
plain JSONL before writing.
3. Add a disabled-by-default worker that can compress cold archived
rollouts behind `local_thread_store_compression`.
The key invariant is that writers append to plain `.jsonl`. A
`.jsonl.zst` file is a cold/read representation; if a write is needed,
the compressed file is materialized back to plain JSONL first. Readers
prefer plain `.jsonl` when both forms exist and can fall back to the
compressed sibling during transitions.
The worker is deliberately the last PR and remains behind an
under-development feature flag. It currently scans only
`archived_sessions`, not active `sessions`, because active sessions have
the highest resume/append race risk. That means this stack does not yet
compress most unarchived local history.
## Known race / follow-up
The remaining unresolved design question is writer/compressor
coordination. Even for archived rollouts, a resume or metadata update
can append while the worker is replacing the plain file with
`.jsonl.zst`; the current double-stat checks narrow but do not fully
eliminate the window where a writer has opened the plain file before
unlink. Do not treat the worker PR as production-ready until we either:
- prevent append/resume paths from racing archived compression, or
- introduce a shared representation/append lock or equivalent
coordination.
The first two PRs are useful independently: they make compressed
rollouts readable and make append paths safely recover back to plain
JSONL. The third PR isolates the worker behavior so that coordination
issue is reviewable separately.
## Validation
Focused local validation for the stack includes:
- `just test -p codex-rollout`
- `just test -p codex-thread-store` where thread-store paths were
touched
- `just test -p codex-features` for the feature flag slice
- `just bazel-lock-check` after dependency graph changes
- scoped `just fix -p ...` passes for changed crates
CI is still the source of truth for the full platform matrix.
## This PR in the stack
This is PR 3/3, based on #25088. It adds the under-development feature
flag and starts the best-effort background worker when enabled. The
worker currently compresses only cold archived rollouts, skips active
sessions, verifies compressed output, preserves mtime and permissions,
keeps a store-level lock heartbeat, and cleans stale temp files.
Stack order:
1. #25087: read compressed local rollouts.
2. #25088: materialize compressed rollouts before append.
3. This PR: add the disabled local compression worker.
## Summary
- preserve existing explicit SQLite thread titles during rollout
reconciliation/backfill when the incoming rollout title is only
first-message-derived
- keep stale inferred-title repair behavior while avoiding session-index
scans during startup backfill
- add a regression test for renamed titles surviving reconcile
## Testing
- just fmt
- just test -p codex-rollout
- just test -p codex-state
Closes#24886.
## Why
Users can configure the TUI status line and terminal title with
`model-with-reasoning`, but issue #24886 asks for a compact
reasoning-only item. That lets a setup show just `default`, `low`,
`medium`, `high`, or `xhigh` without repeating the model name.
## What changed
- Added a `reasoning` item for `/statusline` and `/title` setup flows.
- Rendered the item from the effective reasoning effort, including
collaboration-mode overrides.
- Registered `reasoning` with `codex doctor` so Codex-generated
terminal-title config is not reported as invalid.
- Updated TUI setup snapshots so the picker previews include the new
item.
## Summary
Fixes#25295.
The slash-command popup reused its previous `ScrollState` when the
composer filter token changed. After scrolling the full `/` command
list, typing a narrower filter such as `/st` could clamp the stale
selection into the filtered results and highlight the wrong command.
This resets the popup selection and viewport only when the parsed filter
token changes, so normal arrow navigation is preserved while new filters
start at the first match.
## Why
`codex app [PATH]` is the documented CLI entry point for opening Codex
Desktop on a workspace. Recent desktop builds can focus the app while
failing to honor paths passed as macOS document-open arguments via `open
-a Codex.app <workspace>`, which broke `codex app .` for users. See
#25333; related report: #25166.
The desktop app still supports the explicit
`codex://threads/new?path=...` route, so the CLI should use that
app-owned launch surface instead of depending on folder-open event
delivery.
## What Changed
- Build a `codex://threads/new?path=<workspace>` URL in the macOS app
launcher.
- Pass that URL to `open -a <Codex.app>` instead of passing the
workspace path as a document argument.
- Add coverage that workspace paths needing escaping round-trip through
URL query encoding.
## Verification
- `just test -p codex-cli codex_new_thread_url_encodes_workspace_path`
## Summary
I frequently want to be able to paste into the searchable menu -- the
most common use-case here is when specifying an upstream for a
`/review`, where I copy the upstream from an open terminal.
## Why
`codex exec` was forcing headless runs to `approval_policy = "never"`
even when the resolved reviewer was `auto_review`. That prevented
unattended exec workflows from reaching the reviewed MCP write path they
were configured to use.
## What changed
- Keep the existing headless `never` default for ordinary exec runs.
- Re-resolve exec config without that synthetic override when the final
reviewer resolves to `AutoReview`, so configured or requirements-driven
approval policy is preserved.
- Add regression coverage for:
- `auto_review` plus `on-request` from user config
- requirements-driven `AutoReview`, asserting exec’s final approval
policy matches the no-override control config exactly
## Validation
- `just fmt`
- `cargo test -p codex-exec`
## TL;DR
When you press Esc or Ctrl+C after sending a prompt but before any
output was rendering, it restores the last composer and the message.
## Summary
Cancelling a prompt immediately after submission should behave like
returning to edit that prompt, not like discarding the user's draft.
Today, pressing `Esc` or `Ctrl+C` before Codex responds leaves the
submitted prompt in the transcript and returns an empty composer,
forcing the user to recall or retype it.
When an interrupted turn has not produced substantive visible output,
restore its submitted prompt directly into the composer and roll back
that latest turn. This also covers the first prompt in a fresh thread,
before the TUI has retained a local user-history cell. The restored
draft keeps its text, image attachments, and active collaboration mode
so it can be edited and resubmitted in place.
Restoration is intentionally suppressed once the turn has produced
user-visible activity such as assistant output, tool work, hooks, or
patches. A transient thinking status does not make the prompt
ineligible. Rollback also rebuilds terminal scrollback from the retained
transcript cells so repeated cancellations and terminal resizes do not
duplicate history.
## How to Test
1. Start the TUI with `cargo run -p codex-cli --bin codex`.
2. In a fresh thread, submit the first prompt and press `Esc` before
Codex emits substantive output. Confirm that the prompt returns to the
composer for editing and its submitted transcript row is removed.
3. Repeat with `Ctrl+C`, then repeat after at least one completed turn.
Confirm the same behavior.
4. Submit a prompt, wait for assistant output or tool activity, then
cancel. Confirm that the transcript remains intact and the prompt is not
restored into the composer.
5. Cancel several output-free prompts and resize the terminal between
attempts. Confirm that the startup banner, tip, and transcript history
do not duplicate in scrollback.
Targeted tests:
- `just test -p codex-tui cancelled_turn_edit_restores_prompt`
- `just test -p codex-tui
output_free_interrupted_turn_requests_prompt_restore`
- `just test -p codex-tui
visible_output_prevents_cancelled_turn_prompt_restore`
- `just test -p codex-tui
thinking_status_keeps_cancelled_turn_prompt_restore_eligible`
- `just test -p codex-tui
patch_activity_prevents_cancelled_turn_prompt_restore`
The full `just test -p codex-tui` run completed with `2746` passing
tests and two unrelated existing guardian feature-flag failures. `just
argument-comment-lint` remains blocked locally by the existing Bazel
LLVM `compiler-rt` sanitizer-header glob failure; the touched Rust diff
was manually audited for positional literal comments.
## Summary
- initialize `parent_thread_id` in the compressed rollout test fixture's
`SessionMeta`
- restore rollout test compilation across Bazel test, clippy,
release-build, and argument-comment-lint jobs
## Root cause
PR #25087 (`Read compressed rollouts and materialize before append`)
added `codex-rs/rollout/src/compression_tests.rs` in merge commit
`a8a6071279b6f3112fcc5fc3fee69c48473d7149`. Its `write_rollout` fixture
constructs `SessionMeta` without the required `parent_thread_id` field,
causing `error[E0063]` when Bazel compiles `rollout-unit-tests-bin` on
`main` and downstream PRs.
## Validation
- `UV_CACHE_DIR=/private/tmp/codex-uv-cache just fmt`
- `just test -p codex-rollout` (`59` tests passed; bench smoke passed)
- `git diff --check`
- manually audited the touched Rust diff for positional literal argument
comments; the change adds no positional callsite
## Local lint blocker
- `just argument-comment-lint` could not reach source inspection locally
because Bazel's LLVM dependency fails analysis:
`compiler-rt/BUILD.bazel` glob `include/sanitizer/*.h` matched no files.
## Why
Local rollout compression needs a cold `.jsonl.zst` representation
without letting compressed physical paths leak into append-mode writers.
The unsafe case is resume or metadata update code successfully reading a
compressed rollout and then appending raw JSONL bytes to the zstd file.
This PR folds the former #25088 materialization slice into the
read-support PR so the reader changes and append-safety invariant land
together.
## What Changed
- Teach rollout readers, discovery, listing, search, and ID lookup to
understand compressed `.jsonl.zst` rollouts.
- Keep `.jsonl` as the logical/stored rollout path while allowing read
paths to open either plain or compressed storage.
- Materialize compressed rollouts back to plain `.jsonl` before
append-mode writes, including resume and direct metadata append paths.
- Preserve compressed-file permissions when materializing back to plain
JSONL.
- Refresh thread-store resolved rollout paths after compatibility
metadata writes so reconciliation follows the materialized file.
- Avoid treating transient compression temp files as real rollout lookup
results.
## Remaining Stack
#25089 remains the separate worker PR. It is based directly on this PR
and stays behind the disabled `local_thread_store_compression` feature
flag.
The worker still has a broader coordination question: a resume or
metadata update can race with background compression while a plain file
is being replaced by `.jsonl.zst`. This PR handles the read and
materialize-before-append primitives; it does not make the worker
production-ready.
## Validation
- `just test -p codex-rollout`
- `just test -p codex-thread-store`
- `just fix -p codex-rollout`
- `just fix -p codex-thread-store`
- `just bazel-lock-check`
## Summary
- add an extension-owned `GoalApi` for thread goal get/set/clear
operations
- register live goal runtimes with the API from the goal extension
backend
- cover the API and runtime-effect paths in goal extension tests
## Stack
Follow-up app-server wiring PR: #25108
## Validation
- `just fmt`
- `just fix -p codex-goal-extension`
- `just test -p codex-goal-extension`
## Why
`try_start_turn_if_idle` is the core helper for starting injected input
only when the session is actually idle. It should stay focused on
generic turn-lifecycle safety. The previous `ModeKind::Plan` guard mixed
caller policy into that helper: Plan mode may choose not to auto-start
some extension work, but that decision belongs at the extension or
caller boundary rather than in the session injection primitive.
## What changed
- Removed the `ModeKind::Plan` early return from
`Session::try_start_turn_if_idle`.
- Removed the now-unused `ModeKind` import from
`core/src/session/inject.rs`.
## Testing
Not run locally.
## Why
Goal steering prompts have grown into long inline Rust strings, which
makes the authored prompt text hard to review and easy to damage while
changing the surrounding plumbing. Moving those prompts into embedded
Markdown templates keeps the policy text in the shape reviewers actually
read, while preserving the existing runtime substitution and objective
escaping behavior.
## What changed
- Added `ext/goal/templates/goals/continuation.md`, `budget_limit.md`,
and `objective_updated.md` for the three goal steering prompts.
- Updated `ext/goal/src/steering.rs` to parse those embedded templates
once with `codex-utils-template` and render the existing goal values
into them.
- Kept user objectives XML-escaped before rendering and converted budget
counters into template variables.
- Added the template directory to `ext/goal/BUILD.bazel` `compile_data`
so Bazel has the same embedded prompt inputs as Cargo.
## Testing
- Not run locally.
## Why
The goal extension needs a way to resume an active goal after the thread
becomes idle, but the old core goal runtime should not be refactored as
part of this step. The missing piece is a small core-owned turn-start
primitive: let an extension ask for a normal model turn only when the
thread is idle, and otherwise fail without injecting into whatever is
currently active.
## What Changed
- Adds `CodexThread::try_start_turn_if_idle(...)` as the narrow
extension-facing primitive for synthetic idle work.
- Implements the session side so it refuses to start when:
- the provided input is empty,
- the session is in plan mode,
- a turn is already active, or
- trigger-turn mailbox work is pending.
- Gives trigger-turn mailbox work priority if it appears while the idle
turn is being prepared.
- Wires `GoalExtension::on_thread_idle` to read the active persisted
goal and submit the continuation prompt through this idle-only
primitive.
- Keeps the legacy core goal continuation implementation in place
instead of folding it into this PR.
## Behavior
This is intentionally best-effort. If `try_start_turn_if_idle` observes
that the thread is not idle, or that higher-priority mailbox work should
run first, it returns the input to the caller. The goal extension drops
that continuation prompt and waits for a future idle opportunity instead
of injecting stale synthetic goal text into an active turn.
## Validation
- `just test -p codex-core
try_start_turn_if_idle_rejects_active_turn_without_injecting`
- `just test -p codex-goal-extension`
## Summary
- default multi-agent v2 to direct-model-only tools so code mode does
not wrap subagent tools
- add default root/subagent team prompts aligned with dogfood training
assumptions
- tighten spawn-agent model override wording to prefer the inherited
model by default
## Tests
- just fmt
- just test -p codex-core
spawn_agent_description_lists_visible_models_and_reasoning_efforts
- just test -p codex-core
multi_agent_v2_default_session_thread_cap_counts_root
- just test -p codex-rollout-trace
- just fix -p codex-core
- just fix -p codex-rollout-trace
Note: a broad just test -p codex-core run was attempted locally, but
this sandbox produced unrelated environment failures around
sandbox-exec, missing test_stdio_server, and realtime timeouts.
## Why
This PR
https://github.com/openai/codex/pull/24161#discussion_r3325692763
revealed a subagent data modeling issue, where we overloaded
`forked_from_id` to also mean `parent_thread_id`. That's incorrect since
guardian and review subagents can be a subagent and NOT fork the main
thread's history.
The solution here is to explicitly store a new `parent_thread_id` on
`SessionMeta`, alongside `forked_from_id` which already exists. While
we're at it, also expose it in the app-server protocol on the `Thread`
object.
A thread->subagent relationship and a fork of thread history are
orthogonal concepts.
## What Changed
- Added top-level `parent_thread_id` persistence on `SessionMeta` and
runtime/session plumbing through `SessionConfiguredEvent`,
`CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
`TurnContext`, and `ModelClient`.
- Made turn metadata, request headers, analytics, and subagent-start
events read the separate runtime/top-level parent field instead of
deriving general parent lineage from `SessionSource` or
`forked_from_thread_id`.
- Passed parent lineage separately at delegated subagent, review,
guardian, agent-job, and multi-agent spawn construction sites;
copied-history fork lineage remains derived only from `InitialHistory`.
- Persisted and exposed parent lineage through rollout/thread-store
projections and app-server v2 `Thread.parentThreadId`.
- Updated app-server README text and regenerated app-server schema
fixtures for the additive `parentThreadId` response field.
## Summary
PR 3 of 5 in the cloud-managed config client stack.
Adds enterprise-managed cloud config as a first-class config layer
source. The layer metadata is preserved through config loading,
diagnostics, debug output, hook attribution, and app-server protocol
surfaces.
## Details
- Enterprise-managed config becomes a normal config layer source with
backend-supplied `id` and display `name` attached for provenance.
- These layers are designed to behave like non-file managed config: they
can surface syntax/type diagnostics by layer name even though there is
no physical config file.
- Relative path settings are resolved from a stored config base so
cloud-delivered config remains consistent with existing MDM-delivered
config semantics.
- Hook attribution distinguishes config-delivered hooks from
requirements-delivered hooks via `HookSource::CloudManagedConfig`.
- This remains pull-based and snapshot-oriented; the PR adds layer
identity/diagnostics, not dynamic reload behavior.
## Validation
Validated through the targeted stack checks after rebasing onto current
`main`:
- Rust crate tests for
config/hooks/cloud-config/backend-client/app-server-protocol
- Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
tests
- Python generated-file contract test
- `cargo shear --deny-warnings`
- Targeted `argument-comment-lint` for config/hooks
## Summary
PR 2 of 5 in the cloud-managed config client stack.
Adds a shared requirements-layer composition engine. The composer
defines how ordered requirements layers combine, with focused tests for
the merge semantics and provenance behavior. The final PR in the stack
wires runtime requirements sources into this path.
## Details
- Mental model: requirements layers are ordered lowest priority first,
matching `ConfigLayerStack`; lower-priority layers provide defaults
while higher-priority layers win scalar/list conflicts.
- Regular fields use config-style TOML merging, including recursive
table merging, so requirements layering follows the same broad model as
`config.toml` layering.
- Domain-specific fields keep explicit semantics: `rules.prefix_rules`
and hooks preserve high-priority-first output, hooks fail closed on
active managed-dir conflicts, and `permissions.filesystem.deny_read`
dedupes as a stable high-priority-first union.
- `remote_sandbox_config` is evaluated within each layer before the
regular TOML merge, so host-specific sandbox constraints do not leak
across layers.
- Provenance points at the exact source when one layer owns a value and
uses composite provenance when a table field is assembled from multiple
layers.
## Validation
Local validation:
- `just fmt`
- `cargo check -p codex-config`
- `just test -p codex-config requirements_composition`
- `git diff --check`
CI will run the broader test matrix.
## Why
We want a manual mode that produces the full packaged unsigned macOS
Codex archive, including bundled resources like `rg`, without mixing
those archives into the signing and publishing flow.
The existing `build_unsigned` mode is the handoff used by external
signing and `promote_signed`, so archive-only inspection and local
packaging should live in a separate mode and artifact namespace.
## What Changed
- added `build_unsigned_archive` as a new manual `release_mode`
- kept the existing `build` matrix running for that mode instead of
introducing a separate archive-only job
- wrote unsigned macOS package archives to
`codex-rs/unsigned-archive-dist/...` instead of the normal `dist/...`
tree
- uploaded those packaged macOS outputs as dedicated
`*-unsigned-archive` workflow artifacts
- kept `build_unsigned` and `promote_signed` on their existing raw
unsigned binary path
## Validation
- parsed `.github/workflows/rust-release.yml` with `ruby -e 'require
"yaml"; YAML.load_file(".github/workflows/rust-release.yml")'`
- ran `git diff --check -- .github/workflows/rust-release.yml`
- reviewed the workflow diff to confirm `build_unsigned_archive` now
reuses the existing `build` job while isolating the unsigned macOS
package archives under dedicated artifact names
- locally verified the package builder layout against unsigned macOS
binaries to confirm the packaged archive contains `bin/codex`,
`codex-path/rg`, and `codex-resources/zsh/bin/zsh`
## Summary
PR 1 of 5 in the cloud-managed config client stack.
Adds the generated backend models and client transport surface for the
config bundle endpoint. This bundle endpoint is the replacement backend
surface for legacy cloud requirements; the final PR in the stack
switches runtime consumers over to it.
## Details
- This is transport-only plumbing: no runtime config behavior changes in
this PR.
- The bundle endpoint is the new shared backend surface for
cloud-delivered config and requirements data.
- Both supported path styles are wired here: `/api/codex/config/bundle`
and `/wham/config/bundle`.
- The response types come from generated backend models so later PRs
consume the backend contract directly instead of maintaining
hand-written mirror structs.
## Validation
Validated through the targeted stack checks after rebasing onto current
`main`:
- Rust crate tests for
config/hooks/cloud-config/backend-client/app-server-protocol
- Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
tests
- Python generated-file contract test
- `cargo shear --deny-warnings`
- Targeted `argument-comment-lint` for config/hooks
## Why
Closes#25006.
`tui.keymap` currently rejects `F13` even though Codex's terminal event
layer can report higher function keys. This prevents users from using
common remappings such as Caps Lock to `F13`.
## What Changed
- Define a shared portable upper bound of `F24` for stored TUI
keybindings.
- Accept `f13` through `f24` in config normalization and runtime
parsing.
- Allow `/keymap` capture to persist `F13` through `F24`.
- Update the unsupported-function-key error and add boundary tests for
`F13`, `F24`, and `F25`.
## How to Test
1. Add a binding such as:
```toml
[tui.keymap.global]
open_transcript = "f13"
```
2. Start Codex and press the remapped `F13` key.
3. Confirm Codex loads the config without the previous `F1 through F12`
error and the action runs.
4. Open `/keymap`, capture `F13` for an action, and confirm the saved
binding is `f13`.
5. As a regression check, try to capture `F25` and confirm Codex reports
that only `F1` through `F24` can be stored.
Targeted tests:
- `just test -p codex-config`
- `just test -p codex-tui function_keys`
Full `just test -p codex-tui` completed with 2,752 passing tests, 4
skipped tests, and two unrelated guardian feature-flag failures:
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
## Summary
- Use normal directory loading for plugin install app metadata so
install avoids forced directory refresh while still loading metadata on
cold cache.
- Continue force-refreshing codex_apps tools for auth state.
- Add regression coverage that pre-warms the directory cache and asserts
install returns cached app metadata without extra directory requests.
## Validation
- just fmt
- git diff --check
- just test -p codex-app-server plugin_install_returns_apps_needing_auth
plugin_install_filters_disallowed_apps_needing_auth (blocked locally:
cargo-nextest is not installed)
## Description
Bedrock currently only supports the implicit `default` service tier for
GPT models. This PR strips non-default service tier metadata from
Bedrock model catalogs so Codex does not advertise or send unsupported
tiers.
## What changed
- Normalize both built-in and configured Bedrock catalogs to
default-only service tier behavior.
- Add regression coverage for built-in and configured Bedrock catalogs.
## Validation
- `just fmt`
- `just test -p codex-model-provider`
## Summary
- rename the multi-agent v2 follow-up task tool surface to assign_task
- update core tests and spec-plan expectations
- keep rollout-trace classification backward-compatible with legacy
followup_task
## Tests
- just fmt
- just test -p codex-core
multi_agents_spec::tests::assign_task_tool_requires_message_and_has_no_output_schema
- just test -p codex-rollout-trace
- just fix -p codex-core
- just fix -p codex-rollout-trace
Note: a broad just test -p codex-core run was attempted locally, but
this sandbox produced unrelated environment failures around
sandbox-exec, missing test_stdio_server, and realtime timeouts.
## Problem
Saved threads can already be archived through app-server RPCs, but the
command line did not expose direct archive or unarchive commands.
## Solution
Add `codex archive <thread>` and `codex unarchive <thread>`, resolving
UUIDs or exact thread names before calling the existing `thread/archive`
and `thread/unarchive` RPCs. The commands support scoped remote flags so
callers can target remote app-server endpoints when archiving or
unarchiving threads.
This also fixes a long-standing bug in `codex resume <thread id>` and
`codex fork <thread id>` that I found when testing the new commands.
These operations shouldn't be allowed on archived sessions. They now
fail with an error that tells the user to run `codex unarchive <thread
id>` first.
## Verification
Added app-server coverage for rejecting archived thread resume by id and
checking that the error includes the matching `codex unarchive <thread
id>` command.
## Why
Users following the Amazon Bedrock API-key setup can export
`AWS_BEARER_TOKEN_BEDROCK` and `AWS_REGION`, but Codex's bearer-token
auth path only accepted `model_providers.amazon-bedrock.aws.region`.
That made the documented env-based setup fail with a missing-region
error even though the standard AWS region environment variable was
present.
## What Changed
- Updates Bedrock bearer-token region resolution to use
`model_providers.amazon-bedrock.aws.region` first, then fall back to
`AWS_REGION`, then `AWS_DEFAULT_REGION`.
- Updates the missing-region error to list all supported region sources.
- Adds focused coverage for config precedence, `AWS_REGION`,
`AWS_DEFAULT_REGION`, and the missing-region failure.
## Summary
- Use the session-loaded plugin app IDs as the source of connector
suggestion candidates.
- Remove the redundant plugin reload from
`tool_suggest_connector_ids()`.
- Add regression coverage for connectors declared by a loaded remote
plugin, using the Databricks app case.
## Context
Loaded remote plugins can declare app connector IDs in `.app.json`. The
session-owned `PluginsManager` already loads those plugins and exposes
their effective app IDs.
The connector suggestion path was creating a separate `PluginsManager`
and recomputing plugin app IDs. That new manager does not share the
session manager’s remote installed plugin cache, so app IDs from loaded
remote plugins were missing from connector suggestions.
## Fix
Pass the already-loaded effective app IDs into connector suggestion
generation and use them directly as the plugin-derived connector
candidate set.
Connector candidates are now built from:
- App IDs declared by loaded plugins
- Explicitly configured connector discoverables
- Existing disabled-suggestion filtering
This avoids a second plugin-manager lookup and keeps connector
suggestions aligned with the plugins actually loaded for the turn.
## Behavior
For example, when a plugin is loaded and its `.app.json` declares data
apps, `list_available_plugins_to_install` can now return those data
connectors.
This does not create plugin suggestions from the plugin itself. Plugin
suggestions still come from eligible uninstalled entries in the
marketplace catalog and require existing matching/filtering rules.
## Validation
- `just fmt`
- Added regression coverage for a loaded-plugin connector ID appearing
in discoverable tools
- Attempted `just test -p codex-core`; the command exited unsuccessfully
in the local test environment without useful failure detail captured in
the run output
# Why
Managed requirements can already constrain sandbox policy choices, but
Windows sandbox implementation selection was still resolved
independently from those requirements. That left the TUI able to
continue through the unelevated fallback even when an organization wants
to require the elevated Windows sandbox implementation.
# What
- Add `[windows].allowed_sandbox_implementations` requirements support
for the Windows `elevated` and `unelevated` implementations.
- Apply that allowlist during core config resolution so disallowed
configured or feature-selected Windows sandbox implementations fall back
to an allowed implementation with the existing requirements warning
path.
- Reuse the existing TUI Windows setup prompts to block disallowed
unelevated continuation, keep required elevated setup in front of the
user, and refuse to persist a TUI-selected Windows sandbox mode that
requirements disallow.
# Semantics
| Allowed | Selected | Effective |
| --- | --- | --- |
| `["elevated"]` | `unelevated` / unset | `elevated` |
| `["unelevated"]` | `elevated` / unset | `unelevated` |
| `["elevated", "unelevated"]` | `elevated` | `elevated` |
| `["elevated", "unelevated"]` | `unelevated` | `unelevated` |
| `["elevated", "unelevated"]` | unset | `elevated` |
Availability is handled by interactive setup surfaces after allowlist
resolution. If the effective elevated implementation is not ready,
elevated-only requirements block on setup. When unelevated is also
allowed, the UI may offer the existing unelevated fallback.
## TUI Screens
If elevated setup is not already complete:
```
Your organization requires the default Codex agent sandbox to continue. Set it up to protect your files and control
network access.
Learn more <https://developers.openai.com/codex/windows>
› 1. Set up default sandbox (requires Administrator permissions)
2. Quit
```
If admin setup fails under `["elevated"]`:
```
Couldn't set up your sandbox with Administrator permissions
Your organization requires the default sandbox before Codex can continue.
Learn more <https://developers.openai.com/codex/windows>
› 1. Try setting up admin sandbox again
2. Quit
```
# Next Steps
- extend the requirements/readout surface, such as
`configRequirements/read`, so clients can inspect the loaded
`[windows].allowed_sandbox_implementations` requirement instead of
inferring it from Windows setup state
- consider extending `windowsSandbox/readiness` as well
- update the App startup guide, setup flow, and banner surfaces so an
elevated-only requirement omits any continue-unelevated escape hatch and
blocks startup until a permitted implementation is ready;
- preserve the existing unelevated fallback path when requirements allow
it, including the `["unelevated"]` case where elevated is disallowed
## Summary
- Keep the original `TOOL_SUGGEST_DISCOVERABLE_PLUGIN_ALLOWLIST` as a
fallback seed list, so users with no installed plugins still get initial
install suggestions.
- Allow additional install suggestions from trusted marketplaces:
`openai-curated` and `openai-bundled`.
- Require non-fallback, non-configured marketplace candidates to share
`.app.json` connector IDs with already installed plugins.
- Preserve explicit configured plugin discoverables as an override,
while still omitting installed, disabled, and `NOT_AVAILABLE` plugins.
## Context
`list_available_plugins_to_install` controls which plugins the model can
trigger via `request_plugin_install`. We want a small starter set for
empty/new users, but we also want installed workflow plugins to unlock
relevant source plugins without maintaining every source plugin ID by
hand.
This keeps the legacy plugin ID allowlist only as the starter fallback.
For everything else, the trusted marketplace is the candidate boundary,
and installed app connector overlap is the relevance filter. For
example, an installed Sales plugin can make HubSpot and Granola
suggestible when those source plugins are in `openai-curated` and share
Sales app connector IDs, while an unrelated test-source plugin with an
app connector not declared by Sales stays hidden.
## Test Coverage
- Empty/no-installed-plugin case: returns the fallback seed plugins from
the original allowlist.
- Installed-app expansion: returns non-fallback marketplace plugins only
when their app connector IDs overlap with an installed plugin.
- Sales workflow case: installed Sales declares HubSpot and Granola
apps, so `hubspot@openai-curated` and `granola@openai-curated` are
returned.
- Sales negative case: `test-source@openai-curated` has an app connector
not declared by Sales, so it is not returned.
- Existing guardrails: installed plugins, disabled suggestions, and
`NOT_AVAILABLE` plugins remain omitted; explicit configured
discoverables still work as an override.
## Validation
- `just fmt`
- `just test -p codex-core plugins::discoverable::tests`
- `just test -p codex-core` was attempted earlier, but current `main` /
local env failed with unrelated existing failures around missing
`test_stdio_server`, CLI/code-mode MCP tool setup, and
unified_exec/shell snapshot flakes/timeouts. The touched discoverable
tests pass.
## Summary
- add Vim normal-mode `s` support to substitute the character under the
cursor and enter insert mode
- fix Vim normal-mode `o` so opening below the final line moves the
cursor onto the new blank line
- update keymap config/schema and keymap picker snapshots for the new
action
## Validation
- `just fmt`
- `just write-config-schema`
- `just test -p codex-config`
- focused `just test -p codex-tui` coverage for the Vim `s` and `o`
behavior, keymap conflict handling, and keymap picker snapshots
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
- `git diff --check`
## Notes
A full `just test -p codex-tui` run still has two unrelated Guardian
feature-flag failures in this checkout:
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
## Summary
- preserve macOS `__CF_USER_TEXT_ENCODING` when launching the sandboxed
fs helper
- keep the fs-helper env narrow; this adds only the CoreFoundation
startup var instead of copying the broader MCP stdio baseline
- add focused coverage that the helper keeps that var without admitting
`HOME`
## Diagnosis
The sandboxed fs helper is not launched like a normal child process.
Exec-server rebuilds its environment from an allowlist, then calls
`env_clear()` before re-execing Codex with `--codex-run-as-fs-helper`.
That helper dispatches before the normal Codex startup path and only
needs to boot a small Tokio runtime, read one JSON request from stdin,
perform the direct filesystem operation, and write one JSON response.
The reported macOS hang sampled the helper before Rust main, in
CoreFoundation initialization while resolving the default text encoding:
`_CFStringGetUserDefaultEncoding -> getpwuid_r -> notify_register_check
-> bootstrap_look_up3 -> mach_msg2_trap`. The fs-helper allowlist kept
`PATH` and temp vars for runtime needs, but it dropped macOS
`__CF_USER_TEXT_ENCODING`. Other Codex subprocess launchers that
intentionally build a minimal Unix baseline, such as MCP stdio, already
preserve that variable.
My read is that stripping `__CF_USER_TEXT_ENCODING` forced this internal
helper down CoreFoundation's fallback user-lookup path, and that lookup
intermittently wedged on the affected machine before the helper could
read stdin or touch the target file. Preserving only this macOS startup
variable avoids that fallback without broadening the fs-helper
environment to shell-like vars such as `HOME`, `USER`, locale settings,
terminal settings, or proxy credentials.
Internal Slack thread omitted from the public PR body.
## Validation
- `cd codex-rs && just fmt`
- `git diff --check`
## Summary
This adds `environment: issue-triage` to the Codex-calling issue
workflow jobs so they can read the GitHub Environment Secret while
staying on GitHub-hosted runners for public issue-triggered workflows.
## Why
The standalone `/v1/alpha/search` request now requires a `model`, but
the `web.run` extension currently omits it.
Adds `model` to extension `ToolCall` invocation.
Follow-up to #23823.
## What changed
- Make `SearchRequest.model` required.
- Expose the effective per-turn model on extension tool calls and pass
it in standalone web-search requests.
- Assert the model is forwarded in the app-server round-trip test.
## Testing
- `just test -p codex-api -p codex-tools -p codex-web-search-extension
-p codex-memories-extension -p codex-goal-extension`
- `just test -p codex-core -E
'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
- `just test -p codex-app-server -E
'test(standalone_web_search_round_trips_encrypted_output)'`
## Why
`SandboxPolicy` is the legacy compatibility shape, but
`codex-thread-store` still exposed it through `StoredThread`,
`ThreadMetadataPatch`, and live metadata sync. That kept thread-store
consumers tied to the legacy representation and meant richer permission
profile data could not round-trip through thread metadata or cold
rollout reconciliation.
## What Changed
- Replaced thread-store `sandbox_policy` API fields with canonical
`PermissionProfile` fields.
- Persist new permission-profile metadata as canonical JSON in the
existing SQLite metadata slot while continuing to read older legacy
sandbox policy values.
- Updated local, in-memory, live metadata sync, and rollout extraction
paths to propagate `TurnContextItem::permission_profile()`.
- Re-materialize legacy permission metadata against the final rollout
cwd when rollout-derived metadata replaces stale SQLite summaries.
- Updated affected app-server and core test constructors to build
`PermissionProfile` values directly.
## Test Plan
- `cargo test -p codex-state`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server
summary_from_stored_thread_preserves_millisecond_precision --lib`
- `cargo test -p codex-core realtime_context --lib`
## Summary
Introduce a `CodeModeSession` interface for executing and managing
code-mode cells.
This moves cell lifecycle, callback delegation, termination, and
shutdown behind a session abstraction, while continuing to use the
existing in-process implementation, and the ability to implement an
external process one behind this interface.
A Codex session owns one `CodeModeSession`, which in turn owns its
running cells and stored code-mode state. Each cell is represented to
the caller as a `StartedCell`, exposing its cell ID and initial
response.
It also introduces a `CodeModeSessionDelegate` callback interface. A
session uses the delegate to invoke nested host tools and emit
notifications while a cell is running, allowing the runtime to
communicate with its owning Codex session without depending directly on
core turn handling.
<img width="2121" height="1001" alt="image"
src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31"
/>
## Summary
- terminate sandbox filesystem helpers when the Tokio child handle is
dropped
## Why
A sandbox filesystem helper can stall during process startup before
reading stdin. If the owning async operation is cancelled or torn down,
the spawned helper should not remain running as an orphaned process.
Setting `kill_on_drop(true)` gives the filesystem helper the cleanup
behavior that Tokio child processes otherwise do not enable by default.
This intentionally does not add a timeout. It does not detect or recover
an active hung file edit while the owning future remains alive. A more
precise startup-health mechanism can be handled separately.
## Validation
- `just test -p codex-exec-server` (186 tests passed; benchmark smoke
passed)
- `just fmt`
- `just fix -p codex-exec-server`
- `git diff --check`
## Why
We recently added `forked_from_thread_id` which lets us trace where a
thread's _context_ comes from, but we also want to understand subagent
lineage (e.g. which parent thread spawned this subagent? what kind of
subagent is it?) which is orthogonal.
This PR adds `parent_thread_id` and `subagent_kind` to the
`x-codex-turn-metadata` header sent to ResponsesAPI.
## What changed
- Adds `parent_thread_id` and `subagent_kind` to core-owned
`x-codex-turn-metadata`.
- Restores persisted `SessionSource` and `ThreadSource` from resumed
session metadata so cold-resumed subagent threads keep their lineage on
later Responses API requests.
- Centralizes parent-thread extraction on `SessionSource` /
`SubAgentSource` and reuses it in the Responses client, analytics, agent
control, and state parsing paths.
- Extends reserved-key, git-enrichment, thread-spawn, and app-server v2
metadata coverage for the new lineage fields.
## Verification
- Not run locally per request.
- Added focused coverage in `core/src/turn_metadata_tests.rs` and
`app-server/tests/suite/v2/client_metadata.rs`.
## Why
TUI users can archive saved sessions from other surfaces, but there is
no in-session command for archiving the active session. Since archiving
the active session also exits the TUI, the command should ask for
explicit confirmation instead of firing immediately.
I'm also working on [a companion
PR](https://github.com/openai/codex/pull/25021) that adds `codex
archive` and `codex unarchive` top-level CLI commands.
## What changed
- Adds a new `/archive` slash command described as `archive this session
and exit`.
- Shows a confirmation dialog with `No, don't archive` selected first
and `Yes, archive and exit` as the explicit action.
- On confirmation, calls the existing `thread/archive` app-server RPC
for the active main session and exits after success.
- Keeps `/archive` disabled while a task is running and unavailable in
side conversations.
## Verification
Added focused TUI coverage for the `/archive` confirmation flow,
disabled-while-task-running behavior, and the `/ar` slash-command popup
snapshot.
## Summary
The desktop app now presents the on-request permissions mode as `Ask for
approval` and the manual-review-backed mode as `Approve for me`. The TUI
still exposed older/internal labels like `Default` and `Auto-review`,
which made the same underlying settings look different across clients.
This updates the TUI UX copy to match the app without changing the
underlying default behavior. Fresh threads continue to use the existing
on-request approval mode, now displayed as `Ask for approval`.
The label changes cover `/permissions`, explicit profile permissions
menus, status surfaces, config persistence history/error text, and the
corresponding TUI snapshots.
### Before
<img width="1181" height="119" alt="Screenshot 2026-05-28 at 10 19
47 PM"
src="https://github.com/user-attachments/assets/0664846b-b6dd-4931-b4dd-d0af0d42058e"
/>
<img width="523" height="19" alt="Screenshot 2026-05-28 at 10 21 29 PM"
src="https://github.com/user-attachments/assets/7899c33e-b35d-4684-8389-97e357803423"
/>
### After
<img width="1216" height="117" alt="Screenshot 2026-05-28 at 10 19
32 PM"
src="https://github.com/user-attachments/assets/015aab43-ac97-411f-8031-75cdd887251b"
/>
<img width="567" height="18" alt="Screenshot 2026-05-28 at 10 20 24 PM"
src="https://github.com/user-attachments/assets/28b6422c-b823-4298-b221-c83d46d09d66"
/>
## Why
Some Windows users do not have local admin access, so they cannot
complete the elevated portion of the Windows sandbox setup when Codex
first needs it. This adds an alpha provisioning path that an admin or IT
deployment script can run ahead of time for the Codex user.
The intended managed-deployment shape is:
```powershell
codex sandbox setup --elevated --user "$env:COMPUTERNAME\Alice" --codex-home "C:\Users\Alice\.codex"
```
`--elevated` is treated as the requested sandbox setup level, not as
proof that the process is elevated. The Windows sandbox setup
orchestration still checks that the caller is actually elevated before
launching the helper without a UAC prompt.
## What changed
- Added `codex sandbox setup --elevated` with explicit user selection
via either `--current-user` or `--user ... --codex-home ...`.
- Moved the CLI implementation into `cli/src/sandbox_setup.rs` instead
of growing `cli/src/main.rs`.
- Added a Windows sandbox `ProvisionOnly` helper mode that runs the
elevation-required provisioning work without requiring a workspace cwd
or runtime sandbox policy.
- Reused the existing elevated helper path for creating/updating sandbox
users, configuring firewall/WFP rules, and applying sandbox directory
ACLs.
- Persisted `windows.sandbox = "elevated"` into the target `CODEX_HOME`
so the desktop app does not show the initial sandbox setup banner after
pre-provisioning succeeds.
## Validation
- `cargo fmt -p codex-windows-sandbox -p codex-core -p codex-cli`
- `cargo test -p codex-cli sandbox_setup --target-dir
target\sandbox-setup-check`
- `cargo test -p codex-windows-sandbox
payload_accepts_provision_only_mode --target-dir
target\sandbox-setup-check`
- `git diff --check`
- Manual Windows alpha flow with a standard local user (`Mandi Lavida`):
ran the new setup command from an admin shell, verified the target
`.codex` contents, sandbox marker/secrets, ACLs, firewall rules, and
desktop startup without the sandbox setup banner once experimental
network proxy requirements were disabled.
## Notes
This intentionally does not solve later elevated update coordination for
IT-managed deployments. The setup command can still apply provisioning
updates when run again, but a broader coordination/process story is out
of scope for this alpha.
## Why
The standalone `image_gen.imagegen` extension should behave like native
image generation for artifact persistence and UI completion, while
returning its save-location guidance as part of the tool result instead
of injecting a developer message.
## What Changed
- Added an image-generation completion hook for extension tools so core
can persist generated images and emit the existing `ImageGeneration`
lifecycle events.
- Reused core image artifact persistence for extension output and
removed extension-local save-path/file-writing logic.
- Split shared image persistence from built-in finalization so native
image generation keeps its existing developer-message instruction
behavior.
- Returned the generated image save-location instruction through the
extension `FunctionCallOutput`, alongside the generated image input for
model follow-up.
- Preserved the existing image-generation event shape for current UI and
replay compatibility.
- Avoided cloning the full generated-image base64 payload when emitting
the in-progress image item.
- Removed dependencies no longer needed after moving persistence out of
the extension crate.
## Fast Follow
- Adjust the existing Extension API and add a general `TurnItem`
finalization path for re-usability of code
## Validation
- Ran `just fmt`.
- Ran `just bazel-lock-update`.
- Ran `just bazel-lock-check`.
- Ran `just test -p codex-tools -p codex-extension-api -p
codex-image-generation-extension`.
- Ran `just test -p codex-core
image_generation_publication_is_finalized_by_core`.
- Ran `just test -p codex-core
handle_output_item_done_records_image_save_history_message`.
- Ran `just fix -p codex-tools -p codex-extension-api -p codex-core -p
codex-image-generation-extension`.
Ensures MCP-backed `codex-core` integration tests exercise initialized
servers instead of racing server startup.
I've been idly investigating a few flakes and the failure modes are much
more confusing when a tool call fails because of a failed server start
than when the failed server start causes the test to fail directly.
Adds failure-only logging for MCP streamable HTTP post_message calls and
the underlying reqwest send path, capturing the MCP method/request id,
endpoint shape, auth-header presence, timeout/connect classification,
and sanitized error source chain without logging headers, bodies,
tokens, or full URLs.
## Why
`core/src/config/edit.rs` owns the config edit state machine, but it
also carried the TOML document helper code inline as a nested module.
Moving those helpers into their own file keeps the edit orchestration
easier to scan without changing the config persistence behavior.
## What changed
- Moved the existing `document_helpers` module from
`core/src/config/edit.rs` into
`core/src/config/edit/document_helpers.rs`.
- Added `mod document_helpers;` so the existing `pub(super)` helper API
remains available to the rest of `config::edit`.
## Testing
Not run; this is a refactor-only module extraction with no intended
behavior change.
## Why
Standalone `web.run` calls run in the extension, so they need normal
web-search progress activity while a request is in flight and durable
completed activity after a thread is reloaded.
Follow-up to #23823; uses the extension turn-item emission path added in
#24813.
## What changed
- Emit standalone `web.run` start/completion items through the host
turn-item emitter, preserving standard client delivery and rollout
persistence.
- Include useful completion detail for queries, image queries, and
literal-URL `open`/`find` commands.
- Render completed searches as `Searched the web` or `Searched the web
for <detail>`, with snapshot coverage for the detail-free case.
- Extend the app-server round-trip test to verify completed search
activity is reconstructed by `thread/read` after a fresh-process reload.
## Testing
- `just test -p codex-web-search-extension`
- `just test -p codex-app-server -E
"test(standalone_web_search_round_trips_encrypted_output)"`
## Why
Some models need to select their code-execution behavior through model
catalog metadata. Models without that metadata must continue to follow
the existing `CodeMode` and `CodeModeOnly` feature flags, including when
a newer server sends an enum value this client does not recognize.
## What changed
- add optional `ModelInfo.tool_mode` metadata with `direct`,
`code_mode`, and `code_mode_only`
- treat omitted and unknown wire values as `None`
- resolve `None` from the existing feature flags
- carry the resolved `ToolMode` directly on `TurnContext`, outside
`Config`
- use the resolved value for turn creation, model switches, review
turns, tool planning, and code execution
## Coverage
- add protocol coverage for omitted, known, and unknown enum values
- add focused coverage for flag fallback and explicit metadata
overriding feature flags
- add core integration coverage that fetches remote model metadata
through `/v1/models` and verifies the outbound `/responses` tools for
explicit `direct` and `code_mode_only` selectors
## Stack
- followed by #25032
# Why
Fixes#24529. Completed hook output in the TUI rendered each
`HookOutputEntry` as one ratatui line, so explicit newlines inside hook
output were not shown as separate transcript rows. That made multiline
`SessionStart.additionalContext` hard to inspect even though the
model-facing context path preserved the original text.
# What
- Split completed hook output entries on explicit newlines before
rendering them in `codex-rs/tui/src/history_cell/hook_cell.rs`.
- Keep the hook output prefix, such as `hook context:` or `warning:`, on
the first physical line only.
- Preserve explicit blank lines and render continuation lines with the
hook body indent.
- Add unit coverage for multiline context and warning output, plus a
chatwidget snapshot regression for `SessionStart` history output.
# Testing
- `cargo nextest run -p codex-tui completed_hook_multiline
hook_completed_before_reveal_renders_completed_without_running_flash`
- `just argument-comment-lint -p codex-tui -- --ignore-rust-version
--lib --tests`
## Summary
Remove a stale `TODO(jif)` block of commented-out rollout listing tests
that still referenced an older listing API.
The current rollout listing behavior is covered by the active state DB
and filesystem fallback tests, so keeping the dead commented tests just
adds noise.
## Validation
- `just fmt`
- `just test -p codex-rollout`
## Summary
- handle goal usage-limit turn errors in the goal extension
- exercise the extension path in the goal backend test
## Tests
- just fmt
- just test -p codex-goal-extension
- just fix -p codex-goal-extension
## Summary
- Clarify default, omission, and bounded behavior across built-in tool
schemas, including unified exec, classic shell, Code Mode exec/wait,
multi-agent, agent job, MCP resource, image, goal, plan, tool_search,
and test-sync fields.
- Convert update_plan status to an enum and add short field descriptions
where the schema previously relied on surrounding context.
- Remove the dedicated permission-approval schema test and keep only
updates to existing expected-spec tests.
## Validation
- Ran `just fmt`.
- Ran `git diff --check`.
- Did not run clippy or tests, per request.
Regression has been eval
[here](https://openai.slack.com/archives/C09GDSP1J9X/p1779905065496949)
and we proved there are no regressions
Deletes `codex-rs/debug-client/src/state.rs` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/reader.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/output.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/main.rs` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/commands.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/client.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/README.md` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/Cargo.toml` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
## Why
This PR is stacked on #24918, which moves goal steering onto
source-labeled internal model context fragments. Active-turn goal
steering should use the same running-turn injection path as other
runtime steering, so those fragments enter the pending input queue as
`ResponseItem`s through the existing
[`Session::inject_if_running`](8d6f6cdf69/codex-rs/core/src/session/inject.rs (L12-L27))
behavior instead of through a goal-specific conversion wrapper.
## What Changed
- Exposes a narrow `CodexThread::inject_if_running` bridge for callers
that only hold a thread handle.
- Changes `ext/goal` active-turn steering to pass `ResponseItem`s
directly.
- Builds goal steering prompts as contextual internal model context
`ResponseItem`s before injecting them into the running turn.
## Testing
Not run locally; PR metadata update only.
## Why
Goal steering is one form of runtime-owned model context, but the old
`<goal_context>` wrapper made the contextual-fragment hiding path
goal-specific. Using a source-labeled internal context fragment gives
core and extensions a shared shape for hidden model steering while
keeping those prompts out of visible turn history.
The change also keeps legacy `<goal_context>` messages recognized as
hidden contextual input so existing stored history does not start
rendering old goal-steering prompts as user-visible turn items.
## What Changed
- Replaces `GoalContext` with `InternalModelContextFragment` plus a
validated `InternalContextSource`.
- Renders goal steering as `<codex_internal_context
source="goal">...</codex_internal_context>`.
- Updates core goal steering and `ext/goal` steering to inject the new
internal-context fragment.
- Updates contextual-fragment, event-mapping, goal, and session tests
for the new wrapper.
## Test Coverage
- Adds coverage for detecting the new internal model context fragment.
- Preserves coverage for hiding legacy `<goal_context>` fragments.
- Verifies invalid internal context sources are rejected and arbitrary
context tags are not hidden.
- Updates goal steering/session assertions to expect the new
`source="goal"` wrapper.
## Summary
`fs/watch` was using a local debounce wrapper whose deadline was
initialized once and then reused after the first batch. Once that stale
deadline was in the past, later file changes could bypass the intended
200ms debounce and send noisier `fs/changed` notifications.
This moves the debounce wrapper into `codex-file-watcher` as
`DebouncedWatchReceiver`, resets the debounce deadline for each event
batch, preserves pending paths across cancelled receives, and updates
app-server `fs/watch` to use the shared wrapper.
Fixes#24692.
## Why
Permission profiles can mark filesystem entries as unreadable with
`deny` rules, including glob patterns. Several shell execution paths
treated known-safe commands or execpolicy `allow` rules as sufficient to
run outside the filesystem sandbox. That is not valid for read-capable
commands: for example, `cat` or `ls` may be reasonable to allow
generally, but dropping the sandbox would also drop deny-read
constraints such as `**/*.env`.
## What changed
- Added a shared check that treats active deny-read restrictions as
incompatible with unsandboxed execution.
- Kept first-attempt execution sandboxed for explicit escalation and
execpolicy allow bypasses when deny-read entries are present.
- Prevented no-sandbox retry after a sandbox denial when the active
filesystem policy contains deny-read entries.
- Updated the zsh-fork execve path so prefix-rule `allow` decisions
continue inside the current sandbox when deny-read restrictions are
active.
## Verification
- `cargo test -p codex-core tools::sandboxing::tests`
- `cargo test -p codex-core
tools::runtimes::shell::unix_escalation::tests`
- `cargo test -p codex-core
shell_command_enforces_glob_deny_read_policy`
## Why
When the TUI resumes a thread, transcript replay renders prior user
messages but did not seed the composer history. That leaves the resumed
session with empty in-memory prompt history, so pressing Up can fall
through to persisted global history and surface a prompt from another
thread.
The expected behavior is that prompts from the resumed thread are
recalled first, with global history only as a fallback.
## What changed
- Record replayed user messages into the composer history during resume
replay.
- Preserve the existing persisted history format and avoid any startup
history scan.
- Add focused TUI coverage showing replayed prompts are recalled before
persisted global history.
## Validation
- Added `replayed_user_messages_seed_composer_history` in
`codex-rs/tui/src/chatwidget/tests/history_replay.rs`.
- `just test -p codex-tui replayed_user_messages_seed_composer_history`
passed.
## Summary
This fixes BUGB-17567 by preventing non-Windows command safety
classification from invoking the Windows PowerShell safelist/parser
path.
Previously, `is_known_safe_command` called the Windows PowerShell
classifier on every platform. That classifier recognizes
`pwsh`/`powershell` by basename and delegates script parsing to the
PowerShell AST parser. The parser starts the supplied executable, so on
macOS/Linux a repository-controlled `pwsh` path could execute during
safety parsing before the normal sandboxed command execution path.
The change gates the Windows PowerShell classifier and module behind
`#[cfg(windows)]`. On macOS/Linux, PowerShell-looking commands are no
longer auto-approved by the Windows classifier and instead fall through
to the normal non-Windows safe-command logic.
## Validation
- `/private/tmp/codex-tools/bin/just fmt`
- `PATH=/private/tmp/codex-tools/bin:$PATH
/private/tmp/codex-tools/bin/just test -p codex-shell-command`
The focused test run passed 135 tests with 0 skipped and completed the
crate bench-smoke step.
## Notes
This PR is scoped to the BUGB-17567 macOS/Linux path. Windows still uses
the PowerShell classifier; a separate hardening follow-up should ensure
Windows safety parsing only executes a trusted PowerShell parser binary
and does not spawn the command's `argv[0]` when that path may be
repository-controlled.
## Why
`codex-backend` now authenticates remote-control server websocket
connections with short-lived server tokens instead of the user's ChatGPT
access token. `app-server` needs to mint and refresh those server tokens
without persisting them, so a restart can reconnect from durable
enrollment identity while keeping the bearer token memory-only.
## What Changed
Updated the remote-control transport to consume `remote_control_token`
and `expires_at` from server enroll responses and added
`/server/refresh` support for persisted enrollments or expiring cached
tokens.
Websocket handshakes now send `Authorization: Bearer
<remote_control_token>` with the existing server identity headers, and
no longer send the ChatGPT bearer token or `chatgpt-account-id` on that
websocket path.
The in-memory enrollment state now owns the ephemeral server token
cache, while SQLite still persists only `server_id`, `environment_id`,
and `server_name`. Websocket `401`/`403` clears only the cached token
for refresh on reconnect; websocket or refresh `404` clears stale
persisted enrollment and re-enrolls. Response body previews redact
`remote_control_token` before surfacing parse errors.
## Verification
- `just test -p codex-app-server-transport`
- Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control
--json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached
`status:"connected"` with
`environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.
# Why
Fixes#23993.
Hook command output schemas are published as the contract for hook
authors and schema-driven tooling. The event-specific output schemas
previously described `hookSpecificOutput.hookEventName` as the global
`HookEventNameWire` enum, so a `pre-tool-use.command.output` schema
would validate mismatched values like `PostToolUse`. That made the
schemas less precise than the intended event-specific contract.
# What
Constrain each hook-specific output schema to the matching literal
`hookEventName` value, mirroring the existing input-schema shape.
Also split `SubagentStartHookSpecificOutputWire` from the session-start
output wire so `subagent-start.command.output.schema.json` can emit
`const: "SubagentStart"` instead of sharing the session-start
definition.
# Verification
- `cargo nextest run -p codex-hooks`
- `just fix -p codex-hooks`
- `just argument-comment-lint -p codex-hooks -- --all-targets`
## Why
The Windows Bazel job on `main` started failing after #24108 because one
Windows-only capture test still passed `cwd.as_path()` to
`run_windows_sandbox_capture`. That helper now expects the explicit
`workspace_roots` slice introduced by #24108, so the Windows test target
no longer compiled.
## What Changed
- Updates `legacy_capture_cancellation_is_not_reported_as_timeout` to
pass `workspace_roots_for(cwd.as_path()).as_slice()`, matching the
adjacent capture test and the new runner signature.
## Verification
- GitHub Actions CI is the important validation for this Windows-only
compile path.
- Created quickly to get Windows CI running while the separate Ubuntu
`compact_resume_fork` timeout is still under investigation.
## Why
#23813 switches the Windows sandbox runner path to `PermissionProfile`,
but it still left one runtime anchor for resolving symbolic
`:workspace_roots` entries. That is not enough once a turn has multiple
effective workspace roots: exact entries and deny globs under
`:workspace_roots` need to be materialized for every runtime root before
the command runner chooses token mode or builds ACL plans.
## What Changed
- Replaces the Windows runner/setup `permission_profile_cwd` plumbing
with `workspace_roots: Vec<AbsolutePathBuf>`.
- Resolves Windows-local `PermissionProfile` data with
`materialize_project_roots_with_workspace_roots(...)` instead of the
single-cwd helper.
- Threads `Config::effective_workspace_roots()` through core execution,
unified exec, TUI setup/read-grant flows, app-server setup, app-server
`command/exec`, and `debug sandbox` on Windows.
- Preserves those workspace roots through the zsh-fork escalation
executor instead of rebuilding them from `sandbox_policy_cwd`.
- Makes `ExecRequest::new(...)` and the remaining
`build_exec_request(...)` helper path take
`windows_sandbox_workspace_roots` explicitly so new call sites cannot
silently fall back to `vec![cwd]`.
- Clarifies the `debug sandbox` non-Windows comment: remaining
cwd-dependent resolution still uses `sandbox_policy_cwd`, while
`:workspace_roots` entries are already materialized from config roots.
- Updates elevated runner IPC `SpawnRequest` to send `workspace_roots`
and bumps the framed IPC protocol version to `3` for the payload shape
change.
- Adds Windows-local resolver coverage for expanding exact and glob
`:workspace_roots` entries across multiple roots, plus core helper
coverage proving explicit roots are preserved.
## Verification
- `cargo check -p codex-windows-sandbox -p codex-core -p codex-tui -p
codex-cli -p codex-app-server`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core windows_sandbox`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-app-server windows_sandbox`
- `cargo test -p codex-tui windows_sandbox`
- `cargo test -p codex-cli debug_sandbox`
- `just test -p codex-core unified_exec`
- `just test -p codex-core
build_exec_request_preserves_windows_workspace_roots`
- `env -u CODEX_NETWORK_PROXY_ACTIVE -u
CODEX_NETWORK_ALLOW_LOCAL_BINDING just test -p codex-app-server --lib
command_exec`
- `just test -p codex-windows-sandbox`
- `just test -p codex-exec sandbox`
- `just fix -p codex-core -p codex-app-server -p codex-windows-sandbox`
A local macOS cross-check with `cargo check --target
x86_64-pc-windows-msvc ...` did not reach crate Rust code because native
dependencies require Windows SDK headers (`windows.h` / `assert.h`) in
this environment; Windows CI remains the real target validation.
Two local targeted filters compile but do not run assertions on macOS:
`env -u CODEX_NETWORK_PROXY_ACTIVE -u CODEX_NETWORK_ALLOW_LOCAL_BINDING
just test -p codex-app-server --lib command_exec_processor` matched zero
tests, and `just test -p codex-linux-sandbox landlock` matched zero
tests because the landlock suite is Linux-only.
## Summary
Some permission profiles can encode filesystem reads that should remain
unavailable to the agent. Before this change, the model-visible context
and automatic approval review prompt summarized the effective
permissions as a legacy sandbox mode, which can omit permission-profile
filesystem entries from escalation decisions.
For example, a profile can grant workspace access while denying a
private subtree across every workspace root:
```toml
default_permissions = "restricted-workspace"
[permissions.restricted-workspace.workspace_roots]
"/Users/alice/project" = true
"/Users/alice/other-project" = true
[permissions.restricted-workspace.filesystem]
":minimal" = "read"
[permissions.restricted-workspace.filesystem.":workspace_roots"]
"." = "write"
"private" = "deny"
"private/**" = "deny"
```
The context window now describes the workspace roots and effective
filesystem side of the `PermissionProfile` directly, with deny entries
marked as non-escalatable:
```xml
<environment_context>
<cwd>/Users/alice/project</cwd>
<shell>zsh</shell>
<filesystem><workspace_roots><root>/Users/alice/project</root><root>/Users/alice/other-project</root></workspace_roots><permission_profile type="managed"><file_system type="restricted"><entry access="read"><special>:minimal</special></entry><entry access="write"><path>/Users/alice/project</path></entry><entry access="write"><path>/Users/alice/other-project</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/project/private</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/other-project/private</path></entry><entry access="deny" escalatable="false"><glob>/Users/alice/project/private/**</glob></entry><entry access="deny" escalatable="false"><glob>/Users/alice/other-project/private/**</glob></entry></file_system></permission_profile></filesystem>
</environment_context>
```
Managed requirements can impose the same kind of deny-read restriction:
```toml
[permissions.filesystem]
deny_read = [
"/Users/alice/project/private",
"/Users/alice/project/private/**",
]
```
The automatic approval review prompt also receives the parent turn's
denied-read context, so review decisions can account for the active
permission profile.
## What Changed
- Render the effective filesystem profile in `<environment_context>`,
including profile type, filesystem entries, workspace roots, and
non-escalatable deny entries.
- Persist effective `workspace_roots` in `TurnContextItem` so
resumed/replayed context does not have to bind `:workspace_roots`
through legacy `cwd` fallback.
- Add explicit permission instructions that denied reads are policy
restrictions, not escalation targets.
- Pass the parent turn's denied-read context into automatic approval
reviews.
- Add targeted coverage for prompt rendering, workspace-root
materialization, replay context, and review prompt context.
- Keep the prompt-context test expectations platform-aware so the same
filesystem rendering assertions pass on Unix and Windows paths.
## Testing
- `just test -p codex-core
context::environment_context::tests::serialize_environment_context_with_full_filesystem_profile`
- `just test -p codex-core
context::environment_context::tests::turn_context_item_filesystem_uses_workspace_roots_instead_of_cwd`
- `just test -p codex-core
context::permissions_instructions::permissions_instructions_tests::builds_permissions_from_profile_with_denied_reads`
- `just fix -p codex-core`
I also attempted `just test -p codex-core`; the changed prompt-context
tests passed, but the full local run did not complete cleanly in this
sandboxed macOS environment due unrelated user-shell `CODEX_SANDBOX*`
expectations and integration-test timeouts.
## Summary
Adds an optional `clientId` field to app-server v2 `UserInput` and
carries it through the core `UserInput` model so clients can correlate
echoed user input items without relying on payload equality.
## Details
- Adds `client_id: Option<String>` to core `UserInput` variants.
- Exposes the v2 app-server field as `clientId` on the wire and in
generated TypeScript.
- Preserves the id when converting between app-server v2 and core
protocol types.
- Regenerates app-server schema fixtures.
## Validation
- `just fmt`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-protocol`
- `git diff --check`
## Why
`codex exec-server` has a local WebSocket listener, but it did not apply
the same browser-origin request handling as the `app-server` WebSocket
transport. Requests that carry an `Origin` header should not be upgraded
by this local transport, keeping both local WebSocket servers consistent
and avoiding unexpected browser-initiated connections.
## What changed
- Added an Axum middleware guard in
`codex-rs/exec-server/src/server/transport.rs` that returns `403
Forbidden` for requests carrying an `Origin` header.
- Added an integration test in `codex-rs/exec-server/tests/websocket.rs`
that covers rejection of an `Origin`-bearing WebSocket handshake.
- Kept ordinary WebSocket clients unchanged: existing no-`Origin`
initialization and process behavior remains covered by the crate tests.
## Validation
- `just test -p codex-exec-server` test phase (`186 passed`; run outside
the parent macOS sandbox so nested sandbox tests can execute)
- `just clippy -p codex-exec-server`
## Why
When Guardian or the sandbox network proxy detects and denies a network
attempt, core cancels the associated execution through `ExecExpiration`.
The Windows sandbox capture path was only forwarding the timeout
component of that expiration state. As a result, a sandboxed Windows
command whose network attempt had already been denied could keep running
until its timeout elapsed rather than terminating promptly in response
to the denial.
This change closes that cancellation-propagation gap for Windows sandbox
execution.
## What changed
- Added `WindowsSandboxCancellationToken` as the cancellation hook
exposed to Windows capture backends.
- Extracted the cancellation token from `ExecExpiration` in core and
passed it to both the direct and elevated Windows sandbox capture paths
alongside the existing timeout.
- Updated direct capture to poll for either process exit, timeout, or
cancellation and to terminate cancelled processes without reporting them
as timed out.
- Updated elevated capture to watch for cancellation and send the
existing `Terminate` IPC frame to the elevated runner. The watcher parks
for 50 ms between checks to bound response latency without a tight busy
wait.
- Added Windows regression coverage for a long-running PowerShell
command: cancellation ends capture before its timeout and does not set
`timed_out`.
- Added a visible skip diagnostic when that PowerShell-dependent
regression test cannot execute, and consolidated the duplicated
expiration-policy branch identified in review.
## Security
This improves enforcement after a denied network attempt has been
attributed to a Windows sandboxed execution: the command no longer
remains alive simply because Windows capture lost the cancellation
signal.
This PR does not claim to make Windows offline mode an airtight
no-network or no-exfiltration boundary. It does not introduce
AppContainer or change how network denial is detected; it makes an
already-detected denial promptly stop the affected sandboxed command.
## Validation
### Commands run
- `just fmt`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core network_denial`
- `cargo clippy -p codex-core -p codex-windows-sandbox --tests --no-deps
-- -D warnings`
- `just argument-comment-lint -p codex-windows-sandbox -p codex-core`
The new capture regression is `cfg(target_os = "windows")`, so Windows
CI is the execution coverage for that test path. The local macOS test
runs validate the host-runnable crate and core network-denial behavior.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
#23756 makes packaged Codex builds include and default to the bundled
zsh fork. The important reason to put that fork's directory at the front
of `PATH` is to keep executable-level escalation working after a command
leaves the original shell and later re-enters zsh through `env`.
The expected chain is:
1. The zsh fork runs the top-level shell command.
2. That command launches another program, such as `python3`, while
inheriting the `EXEC_WRAPPER` environment and the escalation socket fd.
3. That program spawns a shell script whose shebang is `#!/usr/bin/env
zsh` rather than `#!/bin/zsh`, and it does not close the escalation fd.
4. `/usr/bin/env` resolves `zsh` through `PATH`, so it must find the
packaged zsh fork before the system zsh.
5. Commands inside that nested script are intercepted by the zsh fork
and can still request escalation from Codex.
If `PATH` resolves `zsh` to the system shell instead, the nested script
loses zsh-fork exec interception. Commands that should request
escalation can then run only in the original sandbox, or fail there,
without Codex ever receiving the approval request.
Shell snapshots make this slightly more subtle: a snapshot can restore
an older `PATH` after the child shell starts. This PR treats the zsh
fork `PATH` prepend as an explicit environment override so snapshot
wrapping preserves it.
## What Changed
- Added shared zsh-fork runtime helpers that prepend the configured zsh
executable parent directory to `PATH` without duplicate entries.
- Applied the zsh fork `PATH` prepend to both zsh-fork `shell_command`
launches and unified-exec zsh-fork launches before sandbox command
construction.
- Kept the shell-command zsh-fork backend API narrow: it derives the
configured zsh path from session services and rebuilds its sandbox
environment from `req.env`, rather than accepting a second, competing
environment map or a separately threaded bin dir.
- Kept Unix-only zsh-fork `PATH` mutation out of Windows clippy-visible
mutability.
- Added coverage for duplicate `PATH` entries, for preserving the zsh
fork prepend through shell snapshot wrapping, and for the nested
`python3` -> `#!/usr/bin/env zsh` escalation flow.
## Testing
- `just fmt`
- `just fix -p codex-core`
I left final test validation to CI after the latest review-comment
cleanup. Before that cleanup, `just test -p codex-core zsh_fork` passed
locally for the zsh-fork-focused tests.
Fixes#12496.
## Why
Windows sandboxed PowerShell commands can run under
`ConstrainedLanguage` on some machines, especially enterprise-managed
Windows environments. In that mode, our PowerShell command prelude could
fail before every command because it directly assigned
`[Console]::OutputEncoding` to UTF-8. The actual user command still ran,
but Codex surfaced noisy `Cannot set property. Property setting is
supported only on core types in this language mode.` output for every
shell call.
## What Changed
- Makes the PowerShell UTF-8 output encoding prelude best-effort by
wrapping the assignment in `try { ... } catch {}`.
- Keeps the existing UTF-8 behavior when PowerShell allows the
assignment.
- Adds focused tests for adding the prelude and avoiding duplicate
prelude insertion.
## Validation
- `cargo fmt -p codex-shell-command`
- `cargo check -p codex-shell-command`
- `git diff --check`
- Verified a local `ConstrainedLanguage` PowerShell probe prints only
the command output with no property-setting error.
- Verified `codex exec` from a temporary `chcp 437` context reports
`utf-8` / `65001` and preserves non-ASCII output (`café`, `漢字`).
## Why
`/diff` is intended to display working-tree changes, but its Git
invocations honored repository-selected executable helpers. A repository
could configure diff/text conversion helpers, clean/process filters,
`core.fsmonitor`, or `post-index-change` hooks that execute when a user
runs `/diff`.
Fixes
[PSEC-4395](https://linear.app/openai/issue/PSEC-4395/codex-cli-diff-executes-repository-selected-diff-helpers).
## What Changed
- Pass `--no-textconv` and `--no-ext-diff` for tracked and untracked
diff generation.
- Discover configured `filter.<driver>.clean` and `.process` entries,
then neutralize the selected drivers through structured
`GIT_CONFIG_KEY_*` / `GIT_CONFIG_VALUE_*` overrides, including driver
names containing `=`.
- Run all `/diff` Git probes with `core.fsmonitor=false` and a null
`core.hooksPath`.
- Use short submodule reporting while ignoring dirty submodule
worktrees, since inspecting a checked-out submodule for dirtiness can
execute filters from that child repository. This intentionally omits
dirty-only submodule markers in order to preserve the non-executing
security boundary.
- Add real-Git marker tests covering filters, fsmonitor, hooks, and
configured helpers inside checked-out submodules.
## How to Test
1. In a repository with ordinary tracked and untracked edits, run
`/diff`.
2. Confirm the normal working-tree diff is shown for top-level files.
3. Run the targeted tests below; they configure executable marker
helpers for repository filters, fsmonitor, hooks, and a checked-out
submodule, then verify `/diff` does not invoke them.
4. Confirm a dirty-only submodule does not cause Codex to enter the
submodule and execute its configured helper.
Targeted tests:
- `just test -p codex-tui get_git_diff_`
Validation note: `just test -p codex-tui` runs the new coverage, but
this worktree currently also has two unrelated failing guardian tests:
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
and
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`.
## Summary
- Add `--stdio` as a direct alias for `codex app-server --listen
stdio://`.
- Keep `--stdio` and `--listen` mutually exclusive.
- Update the app-server README to document both forms.
The codex-windows runner group should be much faster than the default
GHA runners. Since bazel jobs on windows are frequently the long pole
for PRs checks, this will hopefully get people landing a bit faster.
## Why
Add a standalone image generation path that can be exercised
independently of hosted Responses image generation, while retaining the
hosted tool as fallback unless the extension is actually available to
the model.
## What changed
- Added the `codex-image-generation-extension` crate with standalone
generate/edit execution, prior-image selection for edits, model-visible
image output, and local generated-image persistence.
- Installed the extension in app-server behind the disabled-by-default
`imagegenext` feature and backend eligibility checks.
- Updated core tool planning so eligible `image_gen.imagegen` exposure
replaces hosted `image_generation`, while unavailable configurations
retain hosted fallback.
- Added coverage for extension behavior, edit history reuse, feature
gating, auth eligibility, and hosted-tool replacement.
- The extension is installed through app-server only in this PR; other
execution paths retain hosted image generation because hosted
replacement occurs only when the standalone executor is actually
registered and model-visible.
- The initial extension contract intentionally fixes the image model to
`gpt-image-2` and uses automatic image parameters.
- Native generated-image history/card parity and rollout persistence
cleanup are intentionally deferred follow-up work.
## Validation
- `just test -p codex-image-generation-extension`
- `just test -p codex-features`
- `just test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
- `just test -p codex-app-server`
- `just fix -p codex-image-generation-extension -p codex-features -p
codex-core -p codex-app-server`
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`
---------
Co-authored-by: jif-oai <jif@openai.com>
## Why
#24744 introduced the thread idle lifecycle hook so idle continuation
can be owned by lifecycle contributors instead of hard-coded goal
runtime plumbing. Task completion still called
`goal_runtime_apply(GoalRuntimeEvent::MaybeContinueIfIdle)` directly, so
the post-turn idle transition remained goal-specific and did not notify
generic thread lifecycle contributors.
## What Changed
- Add `Session::emit_thread_idle_lifecycle_if_idle()` to gate idle
emission on both no active turn and no queued trigger-turn mailbox work.
- Call that helper when a task clears the active turn, replacing the
direct `GoalRuntimeEvent::MaybeContinueIfIdle` path.
- Cover the behavior with `codex-core` session tests for emitting after
task completion and suppressing idle emission while trigger-turn mailbox
work is pending.
## Verification
- New tests in `core/src/session/tests.rs` exercise the idle lifecycle
emission and trigger-turn mailbox guard.
This change keeps unified @mentions behind the mentions_v2 gate, moves
the flag to under-development, and polishes mention rendering/history
behavior.
It also adds a few small improvements to the mentions feature around
mention rendering and history round-tripping for plugin/tool mentions in
message edit scenarios. Plugin selections now insert `@` mentions with
better casing, and saved history preserves the visible sigil so recalled
messages look the same as what the user typed.
- Preserves `@` sigils when encoding/decoding mention history for
tool/plugin paths.
- Improves plugin mention insertion so display names/casing are
reflected more cleanly in the composer.
- Update composer to render user-entered plugin mentions in the same
color as the mentions menu. ALso applies to recalled/edited messages.
- Left/right arrows no longer switch unified-mention search modes after
an @mention has already been accepted (Ex: arrowing left through a
composed message that contains @mentions).
- Keeps bound mentions stable around punctuation, so accepted `@`
mentions do not reopen the popup and punctuated `$` mentions still
persist to cross-session history.
**Steps to test**
- Ensure mentions_v2 is enabled through configuration or `--enable
mentions_v2`
- Type `@` in the TUI composer and verify filesystem/plugin/skill
results are displayed in the unified mentions menu.
- Select a plugin mention from the `@` popup and confirm the inserted
text is an `@...` mention with casing, then recall/edit the message and
confirm it still renders as `@...`.
- Mention a skill and verify that skills still insert as `$skill`
mentions rather than `@` mentions.
- Verify punctuated mentions such as `@plugin.` and `($skill)` keep
their bound mention behavior across editing and history recall.
2026-05-28 10:30:15 -07:00
897 changed files with 41970 additions and 14360 deletions
# See https://docs.github.com/en/actions/using-jobs/using-concurrency and https://docs.github.com/en/actions/learn-github-actions/contexts for more info.
@@ -55,7 +55,7 @@ In the codex-rs folder where the rust code lives:
trivial; prefer new modules/files and keep `chatwidget.rs` focused on orchestration.
- When running Rust commands (e.g. `just fix` or `just test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
Run `just fmt` (in the `codex-rs` directory) automatically after you have finished making code changes anywhere in this repository; do not ask for approval to run it. Additionally, run the tests:
1. Do not run `cargo test` directly. Use `just test` so test execution follows the repo defaults.
2. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `just test -p codex-tui`.
@@ -76,6 +76,49 @@ Particularly when introducing a new concept/feature/API, before adding to `codex
Likewise, when reviewing code, do not hesitate to push back on PRs that would unnecessarily add code to `codex-core`.
## Code Review Rules
### Model visible context
Codex maintains a context (history of messages) that is sent to the model in inference requests.
1. No history rewrite - the context must be built up incrementally.
2. Avoid frequent changes to context that cause cache misses.
3. No unbounded items - everything injected in the model context must have a bounded size and a hard cap.
4. No items larger than 10K tokens.
5. Highlight new individual items that can cross >1k tokens as P0. These need an additional manual review.
6. All injected fragments must be defined as structs in `core/context` and implement ContextualUserFragment trait
### Breaking changes
Search for breaking changes in external integration surfaces:
- app-server APIs
- CLI parameters
- configuration loading
- resuming sessions from existing rollouts
### Test authoring guidance
For agent changes prefer integration tests over unit tests. Integration tests are under `core/suite` and use `test_codex` to set up a test instance of codex.
Features that change the agent logic MUST add an integration test:
- Provide a list of major logic changes and user-facing behaviors that need to be tested.
If unit tests are needed, put them in a dedicated test file (\*\_tests.rs).
Avoid test-only functions in the main implementation.
Check whether there are existing helpers to make tests more streamlined and readable.
### Change size guidance (800 lines)
Unless the change is mechanical the total number of changed lines should not exceed 800 lines.
For complex logic changes the size should be under 500 lines.
If the change is larger, explore whether it can be split into reviewable stages and identify the smallest coherent stage to land first.
Base the staging suggestion on the actual diff, dependencies, and affected call sites.
## TUI style conventions
See `codex-rs/tui/styles.md`.
@@ -110,6 +153,19 @@ See `codex-rs/tui/styles.md`.
## Tests
### Test module organization
- When adding a new test module, define its contents in a separate sibling file rather than inline in the implementation file.
- Use an explicit `#[path = "..._tests.rs"]` attribute so the test filename is descriptive and easy to locate:
```rust
#[cfg(test)]
#[path = "parser_tests.rs"]
mod tests;
```
- This applies only when introducing a new test module. Do not move or rewrite existing inline `#[cfg(test)] mod tests { ... }` modules solely to follow this convention.
### Snapshot tests
This repo uses snapshot tests (via `insta`), especially in `codex-rs/tui`, to validate rendered output.
@@ -219,3 +275,12 @@ These guidelines apply to app-server protocol work in `codex-rs`, especially:
- Validate with `just test -p codex-app-server-protocol`.
- Avoid boilerplate tests that only assert experimental field markers for individual
request fields in `common.rs`; rely on schema generation/tests and behavioral coverage instead.
## Python Development Best Practices
### Ignore Python 2 compatibility
This project uses Python 3+. You should not use the `__future__` module.
If you need to worry about feature compatibility between different 3.xx point releases, check the
closest `pyproject.toml`'s `requires-python` field to see what minimum runtime version is supported.
"description":"Sparse rolling rate-limit update.\n\nClients should merge available values into the most recent `account/rateLimits/read` response or refetch that snapshot. Nullable account metadata may be unavailable in a rolling update and does not clear a previously observed value.",
"properties":{
"rateLimits":{
"$ref":"#/definitions/RateLimitSnapshot"
@@ -2002,6 +2003,7 @@
"sessionFlags",
"plugin",
"cloudRequirements",
"cloudManagedConfig",
"legacyManagedConfigFile",
"legacyManagedConfigMdm",
"unknown"
@@ -2676,6 +2678,16 @@
}
]
},
"individualLimit":{
"anyOf":[
{
"$ref":"#/definitions/SpendControlLimitSnapshot"
},
{
"type":"null"
}
]
},
"limitId":{
"type":[
"string",
@@ -3134,6 +3146,31 @@
"description":"Notification emitted when watched local skill files change.\n\nTreat this as an invalidation signal and re-run `skills/list` with the client's current parameters when refreshed skill metadata is needed.",
"type":"object"
},
"SpendControlLimitSnapshot":{
"properties":{
"limit":{
"type":"string"
},
"remainingPercent":{
"format":"int32",
"type":"integer"
},
"resetsAt":{
"format":"int64",
"type":"integer"
},
"used":{
"type":"string"
}
},
"required":[
"limit",
"remainingPercent",
"resetsAt",
"used"
],
"type":"object"
},
"SubAgentSource":{
"oneOf":[
{
@@ -3365,6 +3402,13 @@
"null"
]
},
"parentThreadId":{
"description":"The ID of the parent thread. This will only be set if this thread is a subagent.",
"type":[
"string",
"null"
]
},
"path":{
"description":"[UNSTABLE] Path to the thread on disk.",
"description":"Sparse rolling rate-limit update.\n\nClients should merge available values into the most recent `account/rateLimits/read` response or refetch that snapshot. Nullable account metadata may be unavailable in a rolling update and does not clear a previously observed value.",
"properties":{
"rateLimits":{
"$ref":"#/definitions/v2/RateLimitSnapshot"
@@ -5927,6 +6029,16 @@
},
"AppConfig":{
"properties":{
"approvals_reviewer":{
"anyOf":[
{
"$ref":"#/definitions/v2/ApprovalsReviewer"
},
{
"type":"null"
}
]
},
"default_tools_approval_mode":{
"anyOf":[
{
@@ -7579,6 +7691,33 @@
"title":"SystemConfigLayerSource",
"type":"object"
},
{
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"Sparse rolling rate-limit update.\n\nClients should merge available values into the most recent `account/rateLimits/read` response or refetch that snapshot. Nullable account metadata may be unavailable in a rolling update and does not clear a previously observed value.",
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"Sparse rolling rate-limit update.\n\nClients should merge available values into the most recent `account/rateLimits/read` response or refetch that snapshot. Nullable account metadata may be unavailable in a rolling update and does not clear a previously observed value.",
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.