## Summary
Standalone installers and other downstream package consumers need a
stable checksum source for the canonical package archives. Relying on
per-asset metadata makes that harder to consume uniformly, especially
when several package archives are produced in the same release.
This keeps the `codex-package-*.tar.gz` and
`codex-app-server-package-*.tar.gz` assets in the GitHub Release upload
set and adds `codex-package_SHA256SUMS` to `dist/` before the release is
created. The manifest contains one SHA-256 line per package archive and
fails the release job if no package archives are present.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23635).
* #23638
* #23637
* #23636
* __->__ #23635
## Summary
The Linux sandbox should find bundled `bwrap` through the same
package-layout abstraction as the rest of the runtime, instead of
maintaining a separate standalone-specific lookup path.
This adds an `InstallContext` helper for bundled resources and updates
`codex-linux-sandbox` to ask the current install context for
`codex-resources/bwrap` before falling back to the old
executable-relative probes. The tests cover npm-style, standalone, and
canonical package layouts so `bwrap` lookup follows the package
structure introduced earlier in the stack.
## Test plan
- `cargo test -p codex-install-context`
- `cargo test -p codex-linux-sandbox --lib`
- `just fix -p codex-install-context -p codex-linux-sandbox`
- `just bazel-lock-check`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23634).
* #23638
* #23637
* #23636
* #23635
* __->__ #23634
## Why
`code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
was failing because code-mode prompt generation used the same nested
tool spec list for both the model-visible `exec` guide and the runtime
`ALL_TOOLS` surface. That allowed deferred MCP/app tools, such as
`calendar_timezone_option_99`, to leak into the `exec` description even
though they should only be discoverable through `ALL_TOOLS` at runtime.
## What changed
Split code-mode nested tool planning into two sets in
`core/src/tools/spec_plan.rs`:
- runtime nested tool specs still include deferred tools, so
`tools[...]` and `ALL_TOOLS` can call them
- `exec` prompt docs only render non-deferred tools, so deferred app
tools stay out of the model-visible guide
## Validation
- `cargo test -p codex-core --test all
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools --
--nocapture`
- looped the same focused test 5 additional times with `cargo test -q -p
codex-core --test all
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
## Why
The goal extension needs more context when a turn starts than
`turn_store` alone provides.
In particular, goal accounting needs the stable turn id, the effective
collaboration mode, and the cumulative token-usage baseline captured at
turn start so it can:
- suppress goal accounting for plan-mode turns
- compute exact per-turn deltas from cumulative `total_token_usage`
snapshots instead of relying on the most recent usage event alone
- keep the extension-owned accounting path aligned with the host turn
lifecycle
## What
- extend `codex_extension_api::TurnStartInput` to expose `turn_id`,
`collaboration_mode`, and `token_usage_at_turn_start`
- pass the full `TurnContext` plus the captured token-usage baseline
through the turn-start lifecycle emission path
- initialize goal turn accounting from the turn-start baseline and
collaboration mode
- switch goal token accounting to compute deltas from cumulative
`total_token_usage` snapshots
- add coverage for the new turn-start lifecycle fields and for
goal-accounting baseline behavior
## Testing
- added `turn_start_lifecycle_exposes_turn_metadata_and_token_baseline`
in `codex-rs/core/src/session/tests.rs`
- added `ext/goal/tests/accounting.rs` coverage for baseline-aware goal
accounting and plan-mode suppression
## Why
`ext/goal` already had the tool specs and contributor wiring for
`/goal`, but the installed tools still depended on a placeholder backend
that always errored. That meant the extension could not actually own
goal persistence even though the dedicated `thread_goals` store already
exists.
This change wires the extension tools directly to the dedicated goal
store so the extension can create, read, and complete goals against real
state instead of falling back to host-side placeholders.
## What changed
- make `install_with_backend(...)` require
`Arc<codex_state::StateRuntime>` so goal storage is always available
when the extension is installed
- remove the unused no-backend/public backend abstraction from
`ext/goal` and have the tool executors talk directly to `StateRuntime`
- map `thread_goals` rows into the existing protocol response shape for
`get_goal`, `create_goal`, and `update_goal`
- preserve current thread-list behavior by filling an empty thread
preview from the goal objective when a goal is created through the
extension path
- add integration coverage for the installed tool surface, including
successful goal creation and duplicate-create rejection
## Testing
- `cargo test -p codex-goal-extension`
## Why
Remote compaction currently sends a unary `POST /responses/compact` and
waits for the full response before replacing history or emitting the
completed `ContextCompaction` item. Unlike normal `/responses` streaming
requests, this unary compact request had no timeout boundary. If the
backend accepts the request and then stalls before returning a body, the
existing request retry policy never sees a transport error, so the
compact turn can remain stuck after the started item with no completion
or actionable error.
That matches the reported hang shape in issues such as #18363, where
logs show `responses/compact` was posted but no corresponding compact
completion followed. A bounded request timeout gives the existing retry
policy a concrete timeout error to retry instead of letting the user sit
indefinitely on automatic context compaction.
## What
- Add a request timeout to legacy `/responses/compact` calls.
- Size that timeout from the provider stream idle timeout with a
conservative multiplier, so the default compact attempt gets 20 minutes
rather than the 5 minute stream idle window.
- Map API transport timeouts to a request timeout error instead of the
child-process timeout message.
## Testing
- Not run (per request; CI will cover).
## Summary
- migrate exec-server remote registration naming from executor to
environment
- align CLI, public Rust exports, registry error messages, and relay
test fixtures with the environment registry contract
- keep the live registration path and response model consistent with
`/cloud/environment/{environment_id}/register`
## Verification
- `cargo test -p codex-exec-server
remote::tests::register_environment_posts_with_auth_provider_headers
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo test -p codex-exec-server --test relay
multiplexed_remote_environment_routes_independent_virtual_streams
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo check -p codex-cli --manifest-path
/Users/richardlee/code/codex/codex-rs/Cargo.toml` (still running when PR
opened; will update after completion if needed)
add new `EncryptedContent` variant to `FunctionCallOutputContentItem`
ahead of standalone websearch.
we need to be able to receive and pass encrypted function call output
from the new web search endpoint back to responsesapi, as we cannot
expose direct search results.
## Why
The package-builder stack now creates a canonical Codex package
directory where the entrypoint lives under `bin/`, bundled helper
resources live under `codex-resources/`, and bundled PATH-style tools
live under `codex-path/`. That layout is not specific to the standalone
installer: npm, brew, install scripts, and manually unpacked artifacts
should all be able to use the same package shape.
The Rust runtime still only knew about the legacy standalone release
layout, where resources sit next to the executable. A packaged binary
therefore would not identify its package root or prefer the bundled `rg`
from `codex-path/`.
## What changed
- Adds `CodexPackageLayout` to `codex-install-context` and detects it
from an executable path shaped like `<package>/bin/<entrypoint>` when
`<package>/codex-package.json` is present.
- Splits `InstallContext` into an install `method` plus an optional
package layout so the layout is shared across npm, bun, brew,
standalone, and other launch contexts.
- Stores package-layout paths as `AbsolutePathBuf` values.
- Keeps `codex-resources/` and `codex-path/` optional so Codex can still
run with degraded behavior if sidecar directories are missing.
- Updates `InstallContext::rg_command()` to prefer bundled
`codex-path/rg` or `rg.exe`, then fall back to the legacy standalone
resources location, then system `rg`.
- Updates `codex doctor` reporting so package installs show package,
bin, resources, and path directories, and so bundled search detection
recognizes `codex-path/` for any install method.
## Test plan
- `cargo test -p codex-install-context`
- `cargo test -p codex-cli`
- `cargo test -p codex-tui
update_action::tests::maps_install_context_to_update_action`
- `just bazel-lock-check`
## Why
Release CI already builds the Codex entrypoints before staging
artifacts, and the package builder can now package those prebuilt
binaries directly. The workflow should produce package-shaped sidecar
archives from the same staged entrypoints that downstream distribution
channels will eventually consume, without rebuilding `codex` or
`codex-app-server` inside the packaging step.
This intentionally does **not** publish the new package archives as
GitHub Release assets yet. The archives are kept with workflow artifacts
until npm, Homebrew, `install.sh`, winget, and related consumers are
ready to switch over.
## What changed
- Adds a `Build Codex package archive` step to
`.github/workflows/rust-release.yml` after target artifacts are staged.
- Runs `scripts/build_codex_package.py` for both release bundles:
- `primary` builds `codex-package-${TARGET}.tar.gz` with `--variant
codex`.
- `app-server` builds `codex-app-server-package-${TARGET}.tar.gz` with
`--variant codex-app-server`.
- Passes `--entrypoint-bin target/${TARGET}/release/<entrypoint>` so
packages contain the entrypoint already built by the workflow.
- Deletes both package archive names before the final GitHub Release
upload so they remain workflow artifacts only for now.
## Verification
- Parsed `.github/workflows/rust-release.yml` with Ruby's YAML loader.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23582).
* #23596
* __->__ #23582
## Why
The package builder should describe the binaries it is actually
packaging, not require callers to restate release metadata out of band.
A caller-provided `--version` flag can drift from the workspace version,
but running the target entrypoint to discover its version breaks
cross-target packages when the produced binary cannot execute on the
build host.
This PR keeps package metadata tied to the repository source of truth by
reading `[workspace.package].version` from `codex-rs/Cargo.toml`. It
also prepares the package layout for `codex-app-server` packages: the
same package structure can now represent either the CLI entrypoint or
the app-server entrypoint while keeping shared sidecars such as `rg`,
`bwrap`, and Windows sandbox helpers in the existing package
directories.
## What changed
- Removes the `--version` CLI flag from
`scripts/build_codex_package.py`.
- Adds Cargo.toml version discovery for `codex-package.json.version` via
`codex-rs/Cargo.toml`.
- Adds `--entrypoint-bin` so callers can package a prebuilt entrypoint
instead of rebuilding it with Cargo.
- Makes `--variant` an explicit choice between `codex` and
`codex-app-server`, and uses it to select the cargo binary and packaged
`bin/` entrypoint name.
- Updates `scripts/codex_package/README.md` to document variants,
prebuilt entrypoints, and Cargo.toml version detection.
## Verification
- Compiled `scripts/build_codex_package.py` and
`scripts/codex_package/*.py` with `PYTHONDONTWRITEBYTECODE=1`.
- Ran `scripts/build_codex_package.py --help` and verified `--version`
is gone while `--variant` and `--entrypoint-bin` are present.
- Verified the package builder reads version `0.0.0` from
`codex-rs/Cargo.toml`.
- Built a fake cross-target `codex-app-server` package using a
non-executable `--entrypoint-bin`; verified metadata records version
`0.0.0`, variant `codex-app-server`, and `bin/codex-app-server` as the
entrypoint.
- Adds an explicit vertical marketplace kind for plugin/list that
fail-open fetches collection=vertical only when full remote plugins are
disabled.
- Renames the global remote marketplace/cache identity to
openai-curated-remote and materializes remote installs with backend
release versions and app manifests.
Fixes#23223.
## Why
Malformed AGENTS instructions should not fail silently. The reported
issue had invalid UTF-8 in a global `AGENTS.md`; before this change,
Codex treated that decode failure like a missing file, so the personal
instructions disappeared without a user-visible explanation and the
rollout had no `# AGENTS.md instructions` block.
Project-level AGENTS files already used lossy decoding, so their
instructions still appeared, but invalid bytes were replaced without
telling the user. Global and project AGENTS files should behave
consistently: keep usable instruction text when possible, and surface a
diagnostic when bytes had to be replaced.
## What changed
Global `AGENTS.override.md` and `AGENTS.md` loading now reads bytes and
decodes with replacement characters on invalid UTF-8, matching
project-level AGENTS behavior. Both global and project AGENTS loading
now emit a startup warning when invalid UTF-8 is found, and both keep
the instruction text with invalid byte sequences replaced.
Missing files, non-file candidates, empty files, and the existing
`AGENTS.override.md` before `AGENTS.md` precedence keep their current
behavior.
## How users see it
The warnings flow through the existing startup warning surface.
App-server clients receive config-time startup warnings as
`configWarning` notifications during initialization, and thread startup
emits startup warnings as thread-scoped `warning` notifications.
Global AGENTS invalid UTF-8 warnings can appear on both surfaces.
Project-level AGENTS invalid UTF-8 warnings are discovered while
building thread instructions, so they appear as thread-scoped `warning`
notifications. Clients that render warning notifications in the
conversation surface show the message as a visible diagnostic instead of
silently hiding or altering instructions.
## Why
Code mode can use nested unified exec calls as data sources. When those
calls omit `max_output_tokens`, code mode should receive raw command
output so the script can parse or summarize it itself. When code mode
does provide `max_output_tokens`, that explicit nested budget should be
respected, including values above the default unified exec limit, rather
than being capped before code mode sees the result.
## What
- Preserve direct unified exec truncation behavior, while letting
code-mode exec/write_stdin keep `max_output_tokens` as `None` unless
explicitly supplied.
- Make code-mode tool results use raw output when no explicit limit is
present, and use the explicit nested limit directly when one is
specified.
- Refactor unified exec output formatting so `truncated_output` takes
the caller-selected token budget.
- Add e2e integration coverage for explicit nested exec limits, omitted
nested exec limits, outer exec limit propagation, omitted-limit outputs
that exceed both the default and a small truncation policy, explicit
nested limits above those caps, and high explicit limits that still
compact larger command output.
- Reuse the code-mode turn setup helper while directly asserting the
exact exec output item in each test.
## Testing
- `just fmt`
- `git diff --check`
- Not run locally per repo guidance; CI should validate the e2e
integration tests.
## Why
Issue #23214 reports `/ps` showing no background terminals while the
status line still says it is waiting for a background terminal. The race
is in core: `write_stdin` can poll a process that exits before the
response returns. The process manager correctly returns `process_id:
None`, but the handler still emitted a `TerminalInteraction` event using
the requested session id, causing clients to believe a dead process was
still being polled.
Fixes#23214.
## What changed
- Suppress `TerminalInteraction` events for empty `write_stdin` polls
once `response.process_id` is `None`.
- Continue emitting interactions for non-empty stdin, even if that input
causes the process to exit before the response returns.
- Extend the unified exec integration test to assert completed empty
polls do not emit terminal interactions.
## Verification
- `cargo test -p codex-core --test all
unified_exec_emits_one_begin_and_one_end_event`
- `cargo test -p codex-core --test all
unified_exec_emits_terminal_interaction_for_write_stdin`
`cargo test -p codex-core` currently aborts in unrelated
`agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
with a reproducible stack overflow.
## Why
Plugin and skill loading is useful as warmup and early validation, but
session startup does not need to wait for that work before it can
continue building the session. Keeping it on the serial startup path
adds avoidable latency to every fresh thread start.
We still want invalid skill configurations to show up quickly, and we
want the warmup to exercise the same plugin and skill manager caches
that the normal turn path uses.
## What changed
- moved plugin and skill warmup into the session startup async path
instead of eagerly awaiting it on the serial setup path
- kept the warmup using the session's resolved filesystem/environment
context so skill loading still sees the right roots
- preserved early skill-load error logging so broken skill
configurations still surface during startup
- left the per-turn plugin and skill loading path unchanged, so turns
still use the normal cached managers
## Testing
- Not run locally; relying on CI for validation.
## Why
Clients need a typed permission-profile catalog instead of
reconstructing that state from config internals.
## What changed
- Added `permissionProfile/list` to the app-server v2 protocol with
cursor pagination and optional `cwd`.
- The list response includes built-in permission profiles plus
config-defined `[permissions.<id>]` profiles from the effective config
for the request context.
- Permission profiles keep optional `description` metadata for display
purposes.
- App-server docs and schema fixtures are updated for the new RPC.
## Why
`codex-app-server` is published as a standalone release binary, so it
should support the same basic version inspection behavior users expect
from command-line tools. This is independent of package assembly:
package metadata now comes from `codex-rs/Cargo.toml`, but the
standalone app-server binary should still answer `--version` directly.
## What changed
- Enables Clap's generated `--version` flag for the `codex-app-server`
binary by adding `#[command(version)]` to its top-level parser.
## Verification
- Ran `cargo run -p codex-app-server --bin codex-app-server --
--version` and verified it prints `codex-app-server 0.0.0`.
## Why
`rust-ci-full` was paying the full Cargo nextest build-and-run cost once
per platform, with Windows ARM64 as the long pole. This change moves the
heavy work into one reusable per-platform flow: build a nextest archive
once, then replay it across four shards so the platform lane spends less
time running tests serially. For Windows ARM64, the archive is
cross-compiled on Windows x64 and replayed on native Windows ARM64
shards so the slow ARM64 machine is used for execution rather than
compilation.
## What changed
- split the `rust-ci-full` nextest matrix into five explicit
per-platform reusable-workflow calls
- add `.github/workflows/rust-ci-full-nextest-platform.yml` to build one
archive, upload timings/helpers, replay four nextest shards, upload
per-shard JUnit, and roll the shard status back up per platform
- add Windows CI helpers for Dev Drive setup and MSVC ARM64 linker
environment export so the Windows ARM64 archive can be produced on
Windows x64
- keep the existing Cargo git CLI fetch hardening inside the reusable
workflow, since caller workflow-level `env` does not flow through
`workflow_call`
- document the archive-backed shard shape in
`.github/workflows/README.md`
- raise the default nextest slow timeout to 30s so the sharded full-CI
path does not treat every >15s test as stuck
## Verification
- validated the archive/shard flow with live GitHub Actions runs on this
PR branch
- Windows ARM64 cross-compile latency on completed runs:
- https://github.com/openai/codex/actions/runs/26118759651: `34m30s`
lane e2e, `17m16s` archive build, `9m55s` shard phase
- https://github.com/openai/codex/actions/runs/26120777976: `30m36s`
lane e2e, `17m21s` archive build, `6m50s` shard phase
- comparable pre-cross-compile sharded Windows ARM64 runs were `55m01s`,
`50m21s`, and `46m42s`, so the completed cross-compile runs improved the
lane by roughly `12m` to `24m` versus the prior range
- latest corrected cross-compile run:
https://github.com/openai/codex/actions/runs/26120777976
- Windows ARM64 archive built successfully on Windows x64
- native Windows ARM64 shards started immediately after the archive
upload
- 3/4 Windows ARM64 shards passed; the failing shard hit the same
existing `code_mode` test failure seen outside this lane
- downloaded failed-shard JUnit XML from the validation runs and
confirmed the remaining red is from known test failures, not
archive/shard wiring
- no local Codex tests run per repo guidance
## Notes
- this PR does not change developers.openai.com documentation
## Why
The package builder should be easy to run during local iteration.
Requiring callers to provide both a target triple and an output
directory every time makes the common host-package case more awkward
than necessary.
This PR keeps explicit overrides available, but makes the default
invocation useful: build for the current host platform and place the
package in a fresh temporary directory. Because a temp output path is
otherwise easy to lose, the builder continues to print the final package
directory path when it completes.
## What changed
- Makes `--target` optional and maps the host OS/architecture to
supported Codex package target triples.
- Uses GNU Linux target triples for Linux host defaults, while keeping
the musl targets available for release jobs that pass `--target`
explicitly.
- Makes `--package-dir` optional and creates a new `codex-package-*`
temp directory when omitted.
- Documents the new defaults in `scripts/codex_package/README.md`.
## Verification
- Compiled `scripts/build_codex_package.py` and
`scripts/codex_package/*.py` with `PYTHONDONTWRITEBYTECODE=1`.
- Ran `scripts/build_codex_package.py --help` from outside the repo.
- Verified Linux host detection maps `x86_64` and `aarch64` to GNU
target triples.
- Ran a fake-Cargo package build while omitting both `--target` and
`--package-dir`; verified the generated metadata target, expected
package files, and printed temp package path.
- Ran a fake-Cargo package build for `x86_64-unknown-linux-gnu` and
verified `codex`, `bwrap`, and `rg` are assembled into the package.
## Why
`openai/codex#22169` added a regression test that expects an invalid
child `service_tier` to be rejected, but the test used
`Result::expect_err` on `SpawnAgentHandler::handle`. That requires the
`Ok` type to implement `Debug`, and this handler returns `Box<dyn
ToolOutput>`, so Bazel failed while compiling `codex-core` tests before
it could run them.
## What changed
- Capture the handler result and assert on `result.err()` instead of
calling `expect_err`.
- Keep the same `FunctionCallError::RespondToModel` assertion for the
rejected service tier.
## Verification
- `cargo test -p codex-core
spawn_agent_role_service_tier_does_not_hide_invalid_spawn_request`
## Summary
- remove the unreachable ARC monitor path from MCP tool approval
handling
- delete the unused ARC monitor module/tests and trim the orphaned
safety-monitor decision plumbing
- keep `always allow` approvals on the existing auto-approval
short-circuit without a dead monitor hop
## Testing
- `cargo test -p codex-core mcp_tool_call`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`
## Additional validation
- Attempted `cargo test -p codex-core`; the library test target passed,
then the integration target failed in this local environment.
- The narrower MCP-focused rerun passed its unit coverage and only hit
missing local `test_stdio_server` binaries in filtered integration
cases.
## Why
The Codex package builder should produce a complete package without
requiring callers to pre-populate `rg` under `codex-cli/vendor` or have
`dotslash` installed on `PATH`. The repo already tracks the
authoritative DotSlash manifest in `codex-cli/bin/rg`, so the builder
can read that metadata directly and fetch the correct ripgrep archive
for the target it is packaging.
## What changed
- Added `scripts/codex_package/ripgrep.py` to parse `codex-cli/bin/rg`
after stripping the shebang, select the target platform entry, download
the configured artifact, and verify the recorded size and SHA-256
digest.
- Added a cache under `$TMPDIR/codex-package/<target>-rg` so verified
archives can be reused without fetching again.
- Extracted `rg`/`rg.exe` from `tar.gz` and `zip` artifacts into the
package-builder cache, then copied that into `codex-path` through the
existing package layout flow.
- Kept `--rg-bin` as an explicit local override for offline tests and
unusual local workflows.
- Documented the default `rg` fetch/cache behavior in
`scripts/codex_package/README.md`.
## Verification
- Ran wrapper/module syntax compilation.
- Ran `scripts/build_codex_package.py --help` from `/private/tmp`.
- Ran a local manifest fetch test covering shebang-stripped manifest
parsing, `tar.gz` extraction, `zip` extraction, size/SHA-256
verification, and cache reuse after deleting the original source
archives.
- Ran fake-cargo package/archive builds for macOS, Linux, and Windows
target layouts with `--rg-bin`, including an assertion that generated
tar archives contain no duplicate member names.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23526).
* #23541
* __->__ #23526
This fixes a regression wher codex could start in the wrong directory
when a live local app-server socket was present. The issue was that
implicit local socket reuse was being treated like an explicit remote
workspace session, which dropped the invoking cwd unless --cd was
passed.
The change separates local socket transport from true remote workspace
semantics.
- Plain local startup keeps local cwd, trust, resume, picker, and
config-refresh behavior.
- Explicit --remote keeps the existing remote cwd behavior.
- Added coverage for launch target selection and local-session
filtering/cwd behavior.
Steps to test:
- Start a local app-server from a different directory than the repo you
want to use.
- Launch codex from a project/worktree without --cd.
- Confirm the session starts in the invoking directory, not the
app-server process directory.
- Confirm explicit codex --remote ... still preserves existing remote
behavior.
## Why
Custom agent roles are ordinary config layers, so a role file can
already express `service_tier` just like other config values. The
spawned-agent tier path needs to preserve that effective role config and
follow the same precedence pattern as model/reasoning.
## What changed
- Apply an explicit spawn-time `service_tier` onto the child config
before role application, so a role config layer can override it just
like role-defined model/reasoning settings do.
- Validate the final effective child tier after the final child model is
known, while still falling back to the parent tier when no child tier
survives.
- Add focused integration coverage for both v1 and v2 proving role TOML
loads a service tier, spawned children keep that role-configured tier,
and a role tier wins over a conflicting spawn-time tier.
## Validation
- `just fmt`
- `git diff --check`
- Local Rust tests not run, per repo guidance; CI should exercise the
new coverage.
# Summary
Unix-socket app-server startup can currently race when multiple launch
attempts target the same `CODEX_HOME`. Those processes can overlap
before the control socket exists, which lets them enter SQLite state
initialization concurrently and reproduce the startup corruption pattern
seen in SSH mode.
This change makes the app-server own that singleton startup guarantee.
Unix-socket startup now takes a `CODEX_HOME`-scoped advisory lock before
SQLite initialization, runs the existing control-socket preparation
check while holding that lock, returns the established `AddrInUse` error
when another live listener already owns the socket, and releases the
lock once the new listener has bound its socket.
# Design decisions
- The singleton rule lives in `app-server --listen unix://`, not in a
desktop-only caller path, so every Unix-socket launch gets the same race
protection.
- A duplicate raw app-server launch returns an error instead of silently
succeeding. The attach operation remains `app-server proxy`, which
continues to connect to an already-running listener.
- The lock is held only across the dangerous startup window: socket
preparation, SQLite initialization, and socket bind. It is not held for
the app-server lifetime.
- Listener detection stays in `prepare_control_socket_path(...)`, so the
preexisting live-listener and stale-socket behavior remains the single
source of truth.
# Testing
Tests: targeted Unix-socket transport tests on the branch checkout, full
`codex-cli` build on `efrazer-db10`, and an SSH-style smoke on
`efrazer-db10` covering concurrent app-server starts, explicit
duplicate-start errors, and absence of SQLite startup-error matches in
launch logs.
## Summary
- Add `list_available_plugins_to_install` as the inventory step for
plugin and connector install suggestions.
- Slim `request_plugin_install` so it only handles the actual
elicitation, instead of carrying the full discoverable list in its
prompt.
- Emit send-time telemetry when an install elicitation is dispatched,
including requested tool identity in the event payload.
- Emit install-result telemetry through `SessionTelemetry`, including
tool type, user response action, and completion status.
- Update registration and tests to cover the new two-step flow while
keeping the existing `tool_suggest` feature gate unchanged.
## Testing
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core request_plugin_install`
- `cargo test -p codex-core list_available_plugins_to_install`
- `cargo test -p codex-core
install_suggestion_tools_can_be_registered_without_search_tool`
- `cargo test -p codex-otel
manager_records_plugin_install_suggestion_metric`
- `cargo test -p codex-otel
manager_records_plugin_install_elicitation_sent_metric`
- `just fix -p codex-core`
- `just fix -p codex-tools`
- `just fix -p codex-otel`
- `cargo check -p codex-core`
## Summary
- move local-only app-server gating out of `MessageProcessor`
- let `fs/*`, `command/exec`, and `process/spawn` resolve local
availability inside their owning processors
- keep `fs/*` mounted for the future environment-param path while
preserving current no-local error behavior
## Validation
- not run locally per Codex repo guidance
## Summary
- Coerce `path: ""` to `None` at the v2 protocol params deserialization
boundary for `thread/resume` and `thread/fork`.
- Restore the pre-ThreadStore running-thread resume behavior: if
`threadId` is already running, rejoin it by id and treat a non-empty
`path` only as a consistency check; otherwise cold resume keeps `history
> path > threadId` precedence.
- Add protocol, resume, and fork regression coverage for empty path
payloads; refresh app-server schema fixtures for the clarified params
docs.
## Tests
- `just fmt`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol
thread_path_params_deserialize_empty_path_as_none`
- `cargo test -p codex-app-server-protocol --test schema_fixtures`
- `cargo test -p codex-app-server empty_path`
- `RUST_MIN_STACK=8388608 cargo test -p codex-app-server --test all
thread_resume_rejects_mismatched_path_for_running_thread_id`
- `RUST_MIN_STACK=8388608 cargo test -p codex-app-server --test all
thread_resume_uses_path_over_non_running_thread_id`
## Why
Plan mode questionnaires reuse the shared composer for free-form
answers, but the surrounding `request_user_input` overlay still treated
every `KeyCode::Enter` as “advance to the next question.” That made
`Shift+Enter` insert a newline in the composer and then immediately
advance the questionnaire anyway.
Fixes#23448.
## What Changed
- pass the live `RuntimeKeymap` into `RequestUserInputOverlay` so its
embedded composer honors existing `/keymap` composer/editor remaps
- advance free-form questions only on the configured composer submit
binding, instead of any Enter-shaped key event
- add regressions for `Shift+Enter` newline behavior and configured
composer submit bindings inside the questionnaire UI
## How to Test
1. Start Codex in Plan mode and trigger a `request_user_input`
questionnaire with a free-form answer field.
2. Focus the free-form field, type a line, then press `Shift+Enter`.
3. Confirm the answer gains a newline and the questionnaire stays on the
same question.
4. Press the configured submit binding, or plain `Enter` with the
default keymap, and confirm the questionnaire advances as before.
Targeted tests:
- `cargo test -p codex-tui
bottom_pane::request_user_input::tests::freeform_ -- --nocapture`
## Notes
- `cargo test -p codex-tui` still reaches an unrelated existing stack
overflow in
`app::tests::discard_side_thread_removes_agent_navigation_entry` on this
checkout.
- `just argument-comment-lint` is locally blocked by Bazel analysis
failing in external `compiler-rt` before the lint runs.
## Why
Exec-server websocket handling had separate reader and writer tasks for
the same socket. That made websocket control-frame handling asymmetric:
the task reading frames could observe `Ping`, but the task allowed to
write frames was elsewhere. This PR moves each physical websocket onto
one always-running pump so the socket owner can handle application
frames and websocket control frames together.
## What changed
- Refactored direct exec-server websocket connections in `connection.rs`
to use one task that owns the websocket for outbound JSON-RPC, inbound
JSON-RPC, periodic keepalive pings, and `Ping` -> `Pong` replies.
- Refactored relay websocket handling in `relay.rs` the same way for
both the harness-side logical connection and the multiplexed executor
physical socket.
- Preserved the existing keepalive ownership policy: outbound direct
websocket clients still send periodic pings, inbound Axum accepts only
reply with pongs, and relay physical websocket endpoints keep their
existing periodic pings.
- Added focused websocket pump tests for ping/pong, binary JSON-RPC,
relay data, malformed relay text frames, and close/disconnect behavior.
- Reconnect behavior is intentionally left for a follow-up.
## Validation
- Devbox Bazel focused unit target:
- `//codex-rs/exec-server:exec-server-unit-tests
--test_filter='websocket_connection_|harness_connection_|multiplexed_executor_'`
## Why
Codex CLI packaging is currently split across npm staging, standalone
installers, and release bundle creation, which makes it hard to define
and validate a single valid package directory. This adds the first
standalone package builder so later release paths can converge on the
same canonical layout.
## What changed
- Added `scripts/build_codex_package.py` as the stable executable
wrapper around `scripts/codex_package`.
- Added modules for CLI parsing, target metadata, grouped cargo builds,
package layout validation, and archive writing.
- The builder creates a package directory with `codex-package.json`,
`bin/`, `codex-resources/`, and `codex-path`, and can serialize it as
`.tar.gz`, `.tar.zst`, or `.zip`.
- Source-built artifacts are built by one grouped `cargo build`: `codex`
for all targets, `bwrap` for Linux, and the Windows sandbox helpers for
Windows. `rg` remains an input because it is vendored from upstream
rather than built from this repo.
- Added `scripts/codex_package/README.md` to document the package
layout, source-built artifacts, and cargo profile behavior.
## Verification
- Ran wrapper/module syntax compilation.
- Ran `scripts/build_codex_package.py --help` from `/private/tmp`.
- Ran fake-cargo package/archive builds for macOS, Linux, and Windows
target layouts, including an assertion that generated tar archives
contain no duplicate member names.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23513).
* #23526
* __->__ #23513
# What
`SubagentStart` runs once when Codex creates a thread-spawned subagent,
before that child sends its first model request. Thread-spawned
subagents use `SubagentStart` instead of the normal root-agent
`SessionStart` hook.
Configured handlers match on the subagent `agent_type`, using the same
value passed to `spawn_agent`. When no agent type is specified, Codex
uses the default agent type.
Hook input includes the normal session-start fields plus:
- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.
`SubagentStart` may return `hookSpecificOutput.additionalContext`. That
context is added to the child conversation before the first model
request.
# Lifecycle Scope
Only thread-spawned subagents run `SubagentStart`.
Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `SessionStart` hooks and do not run
`SubagentStart`. This avoids exposing synthetic matcher labels for
internal implementation paths.
Also the `SessionStart` hook no longer fires for subagents, this matches
behavior with other coding agents' implementation
# Stack
1. This PR: add `SubagentStart`.
2. #22873: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.
## Context
The CLI rate-limit surfaces previously described usage windows as fixed
5-hour and weekly limits. We want the CLI to display whatever supported
rate-limit period the server returns instead of assuming a 5-hour/1-week
pair. This supports generalized Codex rate-limit periods.
## Summary
- Formats CLI rate-limit warning/status labels only for the supported
returned window durations: approximate 5h, daily, weekly, monthly, and
annual.
- Uses generic fallback copy when a primary or secondary window has no
duration, so missing secondary protection data does not produce stale
weekly copy.
- Uses generic fallback copy for unsupported window durations instead of
adding arbitrary hourly, multi-day, multi-week, or multi-year labels.
- Updates status line and terminal title setup descriptions/previews to
talk about primary/secondary usage limits rather than fixed 5h/weekly
limits.
- Adds rendered insta snapshot coverage for the updated rate-limit
status surfaces and `/status` fallback labels.
## Tests
Tested locally:
- one primary window
- one secondary window
- primary and secondary window
## Why
Filesystem permission profiles used `none` for deny-read entries, which
is less direct than the action the entry actually represents. This
change makes `deny` the canonical filesystem permission spelling while
preserving compatibility for older configs that still send `none`.
## What changed
- rename `FileSystemAccessMode::None` to `Deny`
- serialize and generate schemas with `deny` as the canonical value
- retain `none` only as a legacy input alias for temporary config
compatibility
- update filesystem glob diagnostics and regression coverage to use the
canonical spelling
- refresh config and app-server schema fixtures to match the new wire
shape
## Validation
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core config_toml_deserializes_permission_profiles
--lib`
- `cargo test -p codex-core
read_write_glob_patterns_still_reject_non_subpath_globs --lib`
Earlier in the session, a broad `cargo test -p codex-core` run reached
unrelated pre-existing failures in timing/snapshot/git-info tests under
this environment; the targeted surfaces touched by this PR passed
cleanly.
## Why
The v1 sub-agent tools are a single tool family, but they were exposed
as separate flat function tools. This makes the model-visible surface
less clearly grouped and leaves the legacy names in the same flat
namespace as newer agent tooling.
## What
- Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`,
`wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace.
- Registers the corresponding handlers with namespaced runtime tool
names.
- Updates tool-planning, deferred tool search, and sub-agent
notification tests to assert the namespace shape and child `spawn_agent`
lookup.
## Verification
- Updated `codex-core` coverage for the v1 multi-agent tool plan,
deferred tool search output, and sub-agent tool descriptions.
## Why
`ContextualUserFragment` needs to be usable behind `dyn` for render-only
paths, but associated constants made the trait non-object-safe.
## What changed
- Replaced associated constants with trait methods so `dyn
ContextualUserFragment` can render fragments.
- Preserved the existing typed `T::matches_text(text)` registration
pattern via `type_markers()`.
- Kept default `render()` on the main trait so implementations only
provide role, markers, and body.
- Added unit coverage for rendering a `Box<dyn ContextualUserFragment>`.
## Verification
- `cargo test -p codex-core contextual_user_fragment_is_dyn_compatible`
- `just fix -p codex-core`
## Why
App and skill toggles are user config mutations too. When the TUI is
attached to a remote app server, writing those toggles into the local
`config.toml` makes the UI report success without updating the server
that actually owns the session.
This is **[2 of 4]** in a stacked series that moves TUI-owned config
mutations onto app-server APIs.
## What changed
- Routed app enable/disable persistence through app-server config batch
writes.
- Routed skill enable/disable persistence through `skills/config/write`.
- Avoided refreshing local config from disk after these writes when the
TUI is connected to a remote app server.
## Config keys affected
- `apps.<app_id>.enabled`
- `apps.<app_id>.disabled_reason`
- `[[skills.config]]` entries keyed by `path`, with `enabled = false`
used for persisted disables
## Suggested manual validation
- Connect the TUI to a remote app server, disable an app, reconnect, and
confirm the app remains disabled from remote config rather than local
disk state.
- Re-enable the same app and confirm both `apps.<app_id>.enabled` and
`apps.<app_id>.disabled_reason` are cleared remotely.
- Disable a skill in the manage-skills UI and confirm a remote
`[[skills.config]]` disable entry appears.
- Re-enable that skill and confirm the disable entry is removed and the
effective enabled state updates without relying on local config reloads.
## Stack
1. [#22913](https://github.com/openai/codex/pull/22913) `[1 of 4]`
primary settings writes
2. [#22914](https://github.com/openai/codex/pull/22914) `[2 of 4]` app
and skill enablement
3. [#22915](https://github.com/openai/codex/pull/22915) `[3 of 4]`
feature and memory toggles
4. [#22916](https://github.com/openai/codex/pull/22916) `[4 of 4]`
startup and onboarding bookkeeping
## Why
Steered input was queued as a `ResponseInputItem`, then parsed back into
a user message before recording. That path loses information that only
exists on `UserInput`, such as UI text elements.
This change keeps turn-local pending input typed as either original
`UserInput` or existing response items, so steered user input reaches
user-message recording without being reconstructed from a response item.
## What changed
- Add `TurnInput` for active-turn pending input.
- Queue `Session::steer_input` as `TurnInput::UserInput`.
- Run pending-input hook inspection only for `TurnInput::UserInput`.
- Process drained pending input item by item: accepted items are
recorded, blocked items append hook context and are skipped.
- Remove the pending-input prepend/requeue path.
## Validation
- `just fmt`
- `just fix -p codex-core`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib
session::tests::task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input
-- --nocapture`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib steer_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib pending_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all
pending_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core` (unit tests passed:
1835 passed, 0 failed, 4 ignored; integration `all` target failed due
missing helper binaries such as `codex`/`test_stdio_server` plus
unrelated MCP/search/code-mode expectations)
## Why
`run_turn` was still hand-building hook payloads and lifecycle events
for a couple of hook paths. Most hook call sites already delegate
request construction and event emission to `hook_runtime`, which keeps
turn orchestration focused on model-flow decisions rather than hook
plumbing.
This also keeps the legacy `after_agent` message extraction next to the
legacy hook dispatch instead of leaving response-item walking in
`run_turn`.
## What changed
- Added `run_stop_hooks` in `hook_runtime` to build `StopRequest`, emit
preview start events, run the hook, and emit completion events.
- Added `run_legacy_after_agent_hook` in `hook_runtime` to build and
dispatch the legacy `AfterAgent` hook payload, including extracting
input messages from response items.
- Updated `run_turn` to call the hook runtime helpers and keep only the
resulting continuation/block/stop decisions inline.
- Removed the repeated pending session-start hook check from the run
loop.
## Validation
- `cargo test -p codex-core hook_runtime`
## Why
`turn/start` already accepts an input array on the wire, including an
empty array, but core treated empty input as a no-op before the turn
could reach the model. App-server clients need to be able to start a
real turn even when there is no new user message, for example to let the
model proceed from existing thread context.
## What changed
- Removed the `run_turn` early return that skipped empty-input turns
when there was no pending input.
- Kept empty active-turn steering rejected by moving the `steer_input`
empty-input check until after core has determined whether there is an
active regular turn.
- Empty regular turns now refresh `previous_turn_settings` like other
regular turns, so follow-up context injection state advances
consistently.
- Added an app-server v2 integration test proving `turn/start` with
`input: []` emits started/completed notifications, sends one Responses
request, and does not synthesize an empty user message.
## Validation
- `cargo test -p codex-app-server --test all
turn_start_with_empty_input_runs_model_request`