## Summary
Remove the dead experimental `persistExtendedHistory` app-server flag
and collapse rollout persistence to the single policy app-server already
used.
## What Changed
- Removed `persistExtendedHistory` from v2 thread start/resume/fork
params and deleted its deprecation notice path.
- Removed the persistence-mode enums and plumbing through core, rollout,
and thread-store.
- Made rollout filtering mode-free, keeping the existing limited
persisted-history behavior.
## Test Plan
- `just write-app-server-schema`
- `cargo nextest run --no-fail-fast -p codex-app-server-protocol
schema_fixtures`
- `cargo nextest run --no-fail-fast -p codex-app-server
thread_shell_command_history_responses_exclude_persisted_command_executions`
- `cargo nextest run --no-fail-fast -p codex-rollout -p
codex-thread-store`
- final `rg` for removed flag/type names
## Stack
1. Parent PR: #18240 uses named MITM permissions config.
2. This PR wires managed MITM CA trust into spawned child processes.
## Why
When Codex terminates HTTPS for limited mode or MITM hooks, child HTTPS
clients need to trust Codex's managed MITM CA. Exporting proxy URLs
alone is not enough, but blindly replacing user CA settings would be
wrong: it can break custom enterprise/test roots, leak unreadable CA
files into generated bundles, or make the child env disagree with its
sandbox policy.
## Summary
1. Build immutable managed CA bundles under `$CODEX_HOME/proxy` that
include native roots, the managed MITM CA, and only inherited or
command-scoped CA bundles the child is allowed to read.
2. Export curated CA env vars alongside managed proxy env vars while
preserving user CA override semantics, including nested Codex
`SSL_CERT_FILE` precedence.
3. Thread generated CA bundle paths into child sandbox readable roots,
including debug sandbox execution, so the exported env vars work inside
sandboxed commands.
4. Remove only Codex-generated MITM CA bundle env when a child
intentionally drops managed proxying for escalation or no-proxy retry.
5. Document the managed CA bundle behavior and cover env injection,
per-child bundle generation, sandbox readable roots, and no-proxy
cleanup in tests.
## Validation
1. Ran `just test -p codex-network-proxy`.
2. Ran `just test -p codex-protocol`.
3. Ran `just fix -p codex-network-proxy -p codex-protocol`.
4. Tried focused `codex-core` validation, but the crate currently fails
to compile in `core/tests/suite/guardian_review.rs` because an existing
`Op::UserInput` initializer is missing `additional_context`.
---------
Co-authored-by: Eva Wong <evawong@openai.com>
## Why
Fixesopenai/codex#20944.
Desktop side chats are intentionally ephemeral and pathless. They can
still accept live turns while loaded, but after a reload there is no
persisted rollout to resume. In the reported failure mode, Desktop could
send `$CODEX_HOME` as the resume/fork path for one of these pathless
side chats.
`thread/resume` and `thread/fork` prefer an explicit `path` over
`threadId`, and rollout path lookup only checked that a candidate
existed. That let `$CODEX_HOME` pass as a rollout path, so the later
rollout reader tried to open a directory and surfaced the low-level `Is
a directory` error.
## What Changed
- Reject explicit rollout paths that resolve to a directory or other
non-file before attempting to read rollout history.
- Make `codex_rollout::existing_rollout_path` return only plain or
compressed rollout candidates that are actual files.
- Add an app-server regression test that creates an ephemeral fork, runs
a turn while the side thread is loaded, simulates reload, then verifies
both `thread/resume` and `thread/fork` reject `$CODEX_HOME` with `path
is a directory` instead of the OS-level directory-read error.
- Rebase over the `TestAppServer` rename and update the remaining stale
test harness call sites to use `TestAppServer` with `app_server` local
variables.
Relevant code:
- `thread-store/src/local/read_thread.rs` validates explicit rollout
paths before rollout reading:
25b47c8f42/codex-rs/thread-store/src/local/read_thread.rs (L146-L165)
- `rollout/src/compression.rs` now requires file metadata for plain and
compressed rollout candidates:
25b47c8f42/codex-rs/rollout/src/compression.rs (L940-L950)
- The repro test covers the pathless ephemeral side-chat reload case:
25b47c8f42/codex-rs/app-server/tests/suite/v2/thread_fork.rs (L774-L886)
## Verification
- `just test -p codex-app-server
pathless_ephemeral_thread_rejects_codex_home_path_after_reload`
## Why
Production Codex binaries are stripped for distribution, which leaves
crashes and samples from released builds without the symbols needed for
useful stack traces. Publish symbols as separate release assets so
production artifacts stay small while released builds remain
symbolicateable.
## What changed
- Add `.github/scripts/archive-release-symbols-and-strip-binaries.sh` to
package platform-native symbols into `codex-symbols-<artifact>.tar.gz`
assets while stripping the corresponding Unix binaries before signing.
- Build release binaries with full debug information before producing
distribution artifacts.
- Publish macOS `.dSYM` bundles, Linux `.debug` files with
`.gnu_debuglink`, and Windows `.pdb` files.
- Strip Linux `bwrap` before computing its packaged-resource digest, but
intentionally omit `bwrap` from symbol archives.
- Preserve symbols artifacts in the unsigned macOS promotion flow.
## Verification
- Ran `shellcheck` and `bash -n` on
`.github/scripts/archive-release-symbols-and-strip-binaries.sh`.
- Parsed the modified workflow YAML files and ran `git diff --check`.
- Built a macOS release smoke binary and verified that the archived
`.dSYM` contains DWARF application source information and has the same
UUID as the stripped production binary.
- Built Linux smoke binaries and verified that the symbol archive
contains `codex.debug`, excludes `bwrap.debug`, leaves the expected
`.gnu_debuglink` in `codex`, and does not mutate the separately stripped
`bwrap` digest.
- Staged a Windows smoke archive and verified that it contains the
expected `.pdb` file.
## Why
The TUI shortcut overlay used static labels for `Tab` and `Ctrl+C`, even
though both keys change behavior while a task is running. That made the
visible help misleading: idle `Tab` submits rather than queues, and
active-turn `Ctrl+C` interrupts rather than exits.
Closes#25531.
Closes#25564.
## What Changed
- Pass task-running state into the shortcut overlay renderer.
- Render `Tab` as `submit message` while idle and `queue message` while
work is running.
- Render `Ctrl+C` as `exit` while idle and `interrupt` while work is
running.
- Add snapshot coverage for the active-work shortcut overlay and update
idle overlay snapshots.
## How to Test
1. Start Codex and open the shortcut overlay with `?` while no task is
running.
2. Confirm the overlay shows `tab to submit message` and `ctrl + c to
exit`.
3. Start a task, then open or keep the shortcut overlay visible while
work is running.
4. Confirm the overlay shows `tab to queue message` and `ctrl + c to
interrupt`.
5. Type a follow-up prompt during active work and press `Tab`; confirm
it queues rather than submitting immediately.
Targeted tests:
- `just test -p codex-tui footer_snapshots`
- `just test -p codex-tui footer_mode_snapshots`
## Validation Notes
`just test -p codex-tui` currently has two unrelated guardian
feature-flag test failures on this base:
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
`just argument-comment-lint codex-rs/tui/src/bottom_pane/footer.rs`
could not run locally because the prebuilt wrapper requires `dotslash`;
the touched Rust diff was manually inspected for opaque positional
literals.
Deferred tools need to be searchable even when they are not implemented
inside `codex-core`. Extension-provided tools can be registered for
later discovery, but the search metadata path was still owned by
core-specific runtime hooks, which meant the shared `ToolExecutor`
abstraction could not describe how a deferred extension tool should
appear in `tool_search`.
## Changes
- Move `ToolSearchEntry` and `ToolSearchInfo` into `codex-tools` and
re-export them from the shared tools crate.
- Add a default `ToolExecutor::search_info` implementation that derives
loadable tool-search metadata from function and namespace specs.
- Forward search metadata through extension adapters and exposure
overrides while keeping custom search text/source metadata for dynamic,
MCP, and multi-agent tools.
- Remove the old core-local `tool_search_entry` module now that search
metadata lives with the shared executor APIs.
## Testing
- Added `deferred_extension_tools_are_discoverable_with_tool_search`
coverage in `core/src/tools/spec_plan_tests.rs`.
## Why
#25701 renamed the app-server test harness to `TestAppServer`, but it
raced with #25681, which added a new `plugin_list` test call site still
using the old `McpProcess` name. Once both changes met on `main`,
app-server test builds failed before running the suite because
`McpProcess` no longer exists in that scope.
This PR fixes that CI break by updating the remaining stale call site to
the renamed helper.
## What Changed
- Replaced the `McpProcess::new(...)` use in
`codex-rs/app-server/tests/suite/v2/plugin_list.rs` with
`TestAppServer::new(...)`.
- Renamed the local variable from `mcp` to `app_server` at the same call
site to match the helper rename.
Relevant code:
aadd9c999b/codex-rs/app-server/tests/suite/v2/plugin_list.rs (L234-L246)
## Verification
Not run locally; this is a compile fix for the app-server test harness
rename.
## Summary
- opt the extension-backed standalone `web.run` tool into parallel tool
execution
- update the existing extension registration test to assert that the
tool advertises parallel-call support
## Why
The standalone web-search API endpoint now supports parallel requests.
The extension executor still inherited the shared serial default,
causing multiple `web.run` calls to acquire the exclusive runtime lock.
## Impact
Models that emit multiple standalone web-search calls can now execute
them concurrently when model-level parallel tool calls are enabled.
## Validation
- `just fmt`
- `just test -p codex-web-search-extension`
- `git diff --check origin/main...HEAD`
This PR brought to you via VS Code rather than Codex...
- opened `codex-rs/app-server/tests/common/mcp_process.rs`
- put the cursor on `McpServer`
- hit `F2` and renamed the symbol to `TestAppServer`
- went to the file tree
- hit enter and renamed `mcp_process.rs` to `test_app_server.rs`
- ran **Save All Files** from the Command Palette
- ran `just fmt`
The End
(Admittedly, most of the local variables for `TestAppServer` are still
named `mcp`, though.)
## Why
Python contributions in this repository should target the declared
Python 3 runtime instead of carrying Python 2 compatibility patterns
forward. When compatibility across Python 3 point releases matters,
contributors need a consistent source of truth for the minimum supported
version.
## What changed
- Added Python development guidance to `AGENTS.md` stating that the
repository uses Python 3+ and should not use the `__future__` module.
- Documented that contributors should check the nearest `pyproject.toml`
`requires-python` field when evaluating Python 3 point-release
compatibility.
## Testing
Not run (guidance-only change).
## Summary
- describe omitted code-mode tools as deferred nested tools instead of
MCP/app tools
- update the prompt-description assertion to match
## Why
Deferred dynamic tools are also callable through `tools` and
discoverable in `ALL_TOOLS`, so the previous MCP/app-specific wording
was too narrow.
## Validation
- `just fmt`
- `just test -p codex-code-mode`
- `git diff --check`
## Why
New unit test modules should follow one consistent layout so
implementation files stay focused and test suites remain easy to locate,
without creating cleanup churn in existing inline test modules.
## What changed
- Added `AGENTS.md` guidance requiring new test modules to use separate
sibling `*_tests.rs` files with an explicit `#[path = "..._tests.rs"]`
attribute.
- Clarified that existing inline `#[cfg(test)] mod tests { ... }`
modules should not be moved solely to follow the new convention.
## Validation
- Ran `git diff --check`.
## Summary
Add counter telemetry for the local rollout compression worker so we can
see when it runs, why it skips, and how individual file/materialization
paths resolve.
## Changes
- Emit `codex.rollout_compression.run` with statuses for start,
completion, failure, duplicate-run skip, and missing runtime skip.
- Emit `codex.rollout_compression.file` outcomes for scanned,
compressed, skipped, and failed compression candidates.
- Emit `codex.rollout_compression.temp_cleanup` and
`codex.rollout_compression.materialize` counters for cleanup and
decompression paths.
## Validation
- `just fmt`
- `just test -p codex-rollout`
- `just fix -p codex-rollout`
## Why
When unified exec is configured to launch through the zsh fork, local
commands should not let the model override the shell binary with the
`shell` parameter. The configured zsh fork is the mechanism that makes
`execv(2)` interception reliable, so exposing `shell` for local zsh-fork
execution would create a confusing API surface and undermine the
composition.
Remote environments are different: zsh-fork interception is local-only,
so remote unified-exec calls must keep direct unified-exec behavior and
still expose `shell` when a remote environment can be selected.
## What Changed
- Taught the `exec_command` schema builder to omit the `shell` parameter
when requested.
- Hid `shell` from the unified-exec tool schema only when zsh-fork
unified exec applies to all selectable environments.
- Kept `shell` visible when any remote environment can be targeted,
because those calls run through direct unified exec.
- Made unified exec choose the effective shell mode per selected
environment: local environments keep zsh-fork mode, remote environments
use direct mode.
- Left direct unified-exec behavior unchanged, including support for
model-specified shells there.
## Verification
- Added schema coverage showing `exec_command` can hide `shell`.
- Added planner coverage showing zsh-fork unified exec hides `shell` for
local-only execution while direct unified exec still exposes it.
- Added planner coverage showing `shell` remains visible when a remote
environment is available.
- Added handler coverage showing remote environments use direct
unified-exec shell mode instead of zsh-fork mode.
- Ran the focused `codex-core` shell-parameter and zsh-fork tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24980).
* #24982
* #24981
* __->__ #24980
## Why
`shell_zsh_fork` and unified exec need to remain independently
controllable for enterprise rollouts, but we also need a third mode that
composes them. That composed mode is intended to preserve unified exec
command lifecycle support while letting the zsh fork provide more
accurate `execv(2)` interception.
Enabling `unified_exec_zsh_fork` by itself is intentionally not
sufficient. It is a composition gate, not a dependency-enabling
shortcut:
- `unified_exec` selects the PTY-backed unified exec tool.
- `shell_zsh_fork` opts into the zsh fork backend.
- `unified_exec_zsh_fork` only allows those two already-enabled modes to
be composed so local zsh unified exec commands can launch through the
zsh fork.
This separation is deliberate. Enterprises and staged rollouts must be
able to enable or disable unified exec and zsh-fork independently. If
`unified_exec_zsh_fork` implied either dependency, then enabling one
under-development composition flag would silently activate a shell
backend that the configured feature set left disabled.
This PR introduces only the configuration and planning gate for that
composition. Existing `shell_zsh_fork` behavior continues to use the
standalone shell tool unless the new composition feature is explicitly
enabled alongside both dependencies.
## What Changed
- Added the under-development feature flag `unified_exec_zsh_fork`.
- Added `UnifiedExecFeatureMode` so the three input feature flags
collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
planning.
- Updated tool selection so zsh-fork composition requires
`unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
- Kept the existing standalone zsh-fork shell tool behavior when only
`shell_zsh_fork` is enabled.
- Updated config schema output for the new feature flag.
## Verification
- Added feature and tool-config coverage for the new gate.
- Added planner coverage proving `shell_zsh_fork` remains standalone
until composition is explicitly enabled.
- Ran focused tests for `codex-features`, `codex-tools`, and the
affected `codex-core` planner case.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
* #24982
* #24981
* #24980
* __->__ #24979
## Summary
- add executor filesystem canonicalization as a bound-path operation
- route remote canonicalization through the exec-server filesystem RPC
surface
- keep path normalization attached to the filesystem that owns the path
## Stack
- 2/5 in the skills path authority stack extracted from
https://github.com/openai/codex/pull/25098
- follows merged https://github.com/openai/codex/pull/25121
## Validation
- `cd
/Users/starr/code/codex-worktrees/pr-25098-restack-review-pr1b/codex-rs
&& just fmt`
- Not run: tests/checks (not requested)
- GitHub CI pending on rewritten head
## Why
Guardian auto-review normally uses the provider-preferred review model
when one is available. Some parent models need model-catalog metadata to
select a different review model while keeping older `/models` payloads
compatible when that metadata is absent.
## What changed
- Added optional `ModelInfo::auto_review_model_override` metadata to the
public model payload as a review-model slug.
- Updated Guardian review model selection to prefer the catalog override
when present, while preserving the existing provider preferred-model
path and parent-model fallback when it is omitted.
- Added focused Guardian coverage for override and no-override model
selection.
- Added an `auto_review` core integration suite test that loads override
metadata from a remote model catalog path and asserts the strict
auto-review `/responses` request uses the catalog-selected review model.
- Updated existing `ModelInfo` fixtures and local catalog constructors
for the new optional field.
## Validation
- `cargo test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `cargo test -p codex-core guardian_review_uses_`
- `cargo test -p codex-core
remote_model_override_uses_catalog_model_for_strict_auto_review --test
all`
- `just fix -p codex-protocol`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`
## Why
Python files under `scripts/` were not covered by the repository
formatting recipe or the CI formatting job, so formatting drift could
merge unnoticed.
## What
- Add a dedicated `scripts/pyproject.toml` and `scripts/uv.lock` so
root-script formatting uses a locked Ruff version.
- Extend `just fmt` to format root Python scripts and add
`fmt-scripts-check` for CI.
- Run `just fmt-scripts-check` from `.github/workflows/ci.yml`,
installing `uv` through SHA-pinned `astral-sh/setup-uv` while retaining
the `uv` `0.11.3` pin.
- Apply Ruff formatting to the root Python scripts, including
`scripts/just-shell.py`, and extend
`sdk/python/tests/test_artifact_workflow_and_binaries.py` to cover the
root formatting recipe.
- Update `AGENTS.md` so agents run `just fmt` after code changes
anywhere in the repository.
## Validation
- Extended the existing Python SDK workflow test to assert that `just
fmt` includes root Python scripts.
## Why
[#25089](https://github.com/openai/codex/pull/25089) introduced the
background worker that compresses cold archived rollouts, and
[#25654](https://github.com/openai/codex/pull/25654) made that pass
faster once it starts. But the worker still deleted
`rollout-compression.lock` on successful exit, so the existing six-hour
staleness window only helped with overlapping or crashed workers. Each
new local thread-store initialization could immediately rescan archived
rollouts even if a full pass had just finished.
This change keeps the existing marker around long enough to throttle
redundant reruns. The worker is still best-effort, but it no longer does
repeated startup scans when nothing new is eligible for compression.
## What Changed
- Replace the drop-scoped `CompressionLock` with a
`CompressionRunMarker` that claims the existing
`.tmp/rollout-compression.lock` path and leaves it in place after
success.
- Reuse the existing six-hour staleness window to block both overlapping
starts and immediate reruns, while still letting a stale marker be
reclaimed.
- Update the worker docs and debug logging to describe the new "already
running or recently ran" behavior.
- Extend the rollout compression tests to assert that a successful run
leaves the marker behind and that a fresh marker suppresses a new run.
## Validation
- `just test -p codex-rollout`
## Why
`codex_core` is consistently a bottleneck for incremental builds during
iteration. The simplest fix is to make the crate smaller.
## Summary
`codex-core` owns several reusable prompt renderers and static prompt
assets, which makes the crate harder to split apart.
Rename `codex-review-prompts` to `codex-prompts` and move shared review,
goal, permissions, compaction, realtime, hierarchical AGENTS.md, and
`apply_patch` prompts into it. Move prompt-only tests and update
consumers and `CODEOWNERS`.
## Validation
- `just test -p codex-prompts -p codex-apply-patch`
- `just test -p codex-core prompt_caching`
- Bazel builds for the affected crates
## Summary
Make the root `justfile` usable from Windows without maintaining a
separate Windows copy of most recipes.
The repo recipes previously assumed POSIX shell behavior for things like
variadic argument forwarding (`"$@"`) and stderr redirection
(`2>/dev/null`). That made common workflows such as `just fmt`, `just
test`, and `just log` unreliable from Windows. This PR introduces a
small cross-platform shell adapter so recipes can stay mostly unified
while still expanding the few shell-specific constructs correctly on
macOS/Linux and Windows.
## What Changed
- Add `scripts/just-shell.py` as the configured `just` shell adapter.
- On Unix it invokes `sh -cu`.
- On Windows it invokes `pwsh -CommandWithArgs` so arguments containing
spaces are preserved.
- Add portable recipe placeholders:
- `{args}` expands to `"$@"` on Unix and the equivalent PowerShell
forwarded-args expression on Windows.
- `{stderr-null}` expands to the platform-specific stderr suppression
used by `fmt`.
- Convert most variadic one-line recipes to the unified `{args}` form,
including `codex`, `exec`, `file-search`, `app-server-test-client`,
`fix`, `clippy`, `bench`, `mcp-server-run`, `write-app-server-schema`,
and `argument-comment-lint-from-source`.
- Keep genuinely shell-specific recipes split or Unix-only for now,
including recipes backed by `.sh` scripts or recipes whose bodies are
more than simple command forwarding.
- Add a Windows `just install` path that installs PowerShell via
`winget` when `pwsh` is not available, then runs the same basic Rust
setup steps.
- Update the SDK test that validates the root `fmt` recipe so it
recognizes the new portable stderr placeholder.
## Validation
- `just --summary`
- `just --dry-run fmt`
- `just --dry-run bench-smoke`
- `just --dry-run codex foo "bar binky" baz`
- `just --dry-run write-hooks-schema`
- `just --dry-run bazel-lock-update`
- `just --dry-run argument-comment-lint-from-source -- "foo bar"`
- `git diff --check -- justfile scripts/just-shell.py
sdk/python/tests/test_artifact_workflow_and_binaries.py`
- Verified Windows argv preservation through `scripts/just-shell.py`
with arguments containing spaces.
- `uv run --frozen --project sdk/python --extra dev pytest
sdk/python/tests/test_artifact_workflow_and_binaries.py::test_root_fmt_recipe_formats_rust_and_python_sdk`
## Summary
- Preserve app declaration order when loading plugin .app.json files.
- Keep plugin connector summaries in plugin app order after connector
metadata is merged and filtered.
- Add regression coverage for .app.json order and connector summary
order.
## Validation
- just fmt
- just test -p codex-chatgpt
connectors_for_plugin_apps_returns_only_requested_plugin_apps
- just test -p codex-core-plugins
effective_apps_preserves_app_config_order
- just fix -p codex-core-plugins (passes with existing clippy
large_enum_variant warning in core-plugins/src/manifest.rs)
- just fix -p codex-chatgpt
- just bazel-lock-update
- just bazel-lock-check
## Summary
Renames the MultiAgentV2 turn-triggering tool from `assign_task` to
`followup_task` so the exposed tool name better describes sending an
additional task to an existing agent.
This updates the tool spec, handler/module names, registry wiring,
default multi-agent v2 usage hints, and tests. Rollout trace
classification keeps accepting legacy `assign_task` events so older
traces still reduce correctly, while docs show the new tool name.
## Test plan
- `just test -p codex-core followup_task`
- `just test -p codex-core -E
'test(multi_agent_feature_selects_one_agent_tool_family) |
test(multi_agent_v2_can_use_configured_tool_namespace) |
test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'`
- `just test -p codex-rollout-trace`
- `just fix -p codex-core`
- `just fix -p codex-rollout-trace`
Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase
because the local environment could not resolve `hatchling>=1.27.0` from
the configured internal registry. A full `just test -p codex-core` also
hit unrelated environment-sensitive integration failures involving
missing spawned test binaries/sandbox behavior; the changed multi-agent
spec/handler tests passed in the filtered runs above.
## Summary
- add public `codex_exec_server::EnvironmentPathRef`
- bind an absolute path to its owning executor filesystem
- keep path operations in the next review slice
## Stack
- 1/5 in the skills path authority stack extracted from
https://github.com/openai/codex/pull/25098
## Validation
- `cd /Users/starr/code/codex-worktrees/pr-25098-restack4/codex-rs &&
just fmt`
- GitHub CI pending on rewritten head
## Why
[#25089](https://github.com/openai/codex/pull/25089) added the
background worker for compressing cold archived rollouts, but the worker
still processed files effectively one at a time: each compression job
was sent to `spawn_blocking` and then awaited before the next file
started. On machines with a backlog of archived rollouts, that makes
catch-up slower than it needs to be even though the actual compression
work already runs off the async runtime.
## What Changed
- Queue rollout compression work in a `JoinSet` while directory
traversal continues.
- Cap the worker at two in-flight compression jobs so it can overlap
compression without turning the background task into unbounded blocking
work.
- Drain pending jobs before returning, including the
`read_dir.next_entry()` error path, so every launched job still
contributes to the final `compressed`, `skipped`, and `failed` stats.
- Treat task join failures the same way as compression failures in the
worker's warning and failure accounting.
## Summary
- Configure the rust-release build job with
`CARGO_NET_GIT_FETCH_WITH_CLI=true`
- Document the macOS SecureTransport/libgit2 failure mode that hit the
`libwebrtc`/`libyuv` git submodule fetch
## Root cause
The release run at
https://github.com/openai/codex/actions/runs/26717498860/job/78745156683
repeatedly failed before compilation because Cargo's libgit2 fetch path
could not clone the nested `yuv-sys/libyuv` submodule from
`chromium.googlesource.com`, ending with `SecureTransport error:
connection closed via error`.
## Validation
- `git diff --check`
This is a workflow-only change, so I did not run Rust package tests.
## Why
Codex 0.135.0 started shipping bundled SQLite 3.51.x via SQLx 0.9.0 to
avoid the older WAL corruption bug fixed by #24728. On Windows x64,
#25367 reports an immediate `STATUS_ILLEGAL_INSTRUCTION` crash on a
Haswell CPU when starting normal Codex paths.
Rather than downgrading SQLite, this keeps the newer bundled SQLite
source and removes SQLite compiler-intrinsic code paths from the Windows
x64 release build.
## What changed
For `x86_64-pc-windows-msvc` release builds, export
`LIBSQLITE3_FLAGS=SQLITE_DISABLE_INTRINSIC` before `cargo build` in:
- `.github/workflows/rust-release.yml`
- `.github/workflows/rust-release-windows.yml`
Other targets keep their current SQLite build flags.
## Verification
- `git diff --check`
## Rollout compression stack
This stack splits #24941 into reviewable steps for local rollout
compression. The design is intentionally staged:
1. Teach readers, listing, search, and lookup to understand compressed
rollouts.
2. Make append and resume paths materialize compressed rollouts back to
plain JSONL before writing.
3. Add a disabled-by-default worker that can compress cold archived
rollouts behind `local_thread_store_compression`.
The key invariant is that writers append to plain `.jsonl`. A
`.jsonl.zst` file is a cold/read representation; if a write is needed,
the compressed file is materialized back to plain JSONL first. Readers
prefer plain `.jsonl` when both forms exist and can fall back to the
compressed sibling during transitions.
The worker is deliberately the last PR and remains behind an
under-development feature flag. It currently scans only
`archived_sessions`, not active `sessions`, because active sessions have
the highest resume/append race risk. That means this stack does not yet
compress most unarchived local history.
## Known race / follow-up
The remaining unresolved design question is writer/compressor
coordination. Even for archived rollouts, a resume or metadata update
can append while the worker is replacing the plain file with
`.jsonl.zst`; the current double-stat checks narrow but do not fully
eliminate the window where a writer has opened the plain file before
unlink. Do not treat the worker PR as production-ready until we either:
- prevent append/resume paths from racing archived compression, or
- introduce a shared representation/append lock or equivalent
coordination.
The first two PRs are useful independently: they make compressed
rollouts readable and make append paths safely recover back to plain
JSONL. The third PR isolates the worker behavior so that coordination
issue is reviewable separately.
## Validation
Focused local validation for the stack includes:
- `just test -p codex-rollout`
- `just test -p codex-thread-store` where thread-store paths were
touched
- `just test -p codex-features` for the feature flag slice
- `just bazel-lock-check` after dependency graph changes
- scoped `just fix -p ...` passes for changed crates
CI is still the source of truth for the full platform matrix.
## This PR in the stack
This is PR 3/3, based on #25088. It adds the under-development feature
flag and starts the best-effort background worker when enabled. The
worker currently compresses only cold archived rollouts, skips active
sessions, verifies compressed output, preserves mtime and permissions,
keeps a store-level lock heartbeat, and cleans stale temp files.
Stack order:
1. #25087: read compressed local rollouts.
2. #25088: materialize compressed rollouts before append.
3. This PR: add the disabled local compression worker.
## Summary
- preserve existing explicit SQLite thread titles during rollout
reconciliation/backfill when the incoming rollout title is only
first-message-derived
- keep stale inferred-title repair behavior while avoiding session-index
scans during startup backfill
- add a regression test for renamed titles surviving reconcile
## Testing
- just fmt
- just test -p codex-rollout
- just test -p codex-state
Closes#24886.
## Why
Users can configure the TUI status line and terminal title with
`model-with-reasoning`, but issue #24886 asks for a compact
reasoning-only item. That lets a setup show just `default`, `low`,
`medium`, `high`, or `xhigh` without repeating the model name.
## What changed
- Added a `reasoning` item for `/statusline` and `/title` setup flows.
- Rendered the item from the effective reasoning effort, including
collaboration-mode overrides.
- Registered `reasoning` with `codex doctor` so Codex-generated
terminal-title config is not reported as invalid.
- Updated TUI setup snapshots so the picker previews include the new
item.
## Summary
Fixes#25295.
The slash-command popup reused its previous `ScrollState` when the
composer filter token changed. After scrolling the full `/` command
list, typing a narrower filter such as `/st` could clamp the stale
selection into the filtered results and highlight the wrong command.
This resets the popup selection and viewport only when the parsed filter
token changes, so normal arrow navigation is preserved while new filters
start at the first match.
## Why
`codex app [PATH]` is the documented CLI entry point for opening Codex
Desktop on a workspace. Recent desktop builds can focus the app while
failing to honor paths passed as macOS document-open arguments via `open
-a Codex.app <workspace>`, which broke `codex app .` for users. See
#25333; related report: #25166.
The desktop app still supports the explicit
`codex://threads/new?path=...` route, so the CLI should use that
app-owned launch surface instead of depending on folder-open event
delivery.
## What Changed
- Build a `codex://threads/new?path=<workspace>` URL in the macOS app
launcher.
- Pass that URL to `open -a <Codex.app>` instead of passing the
workspace path as a document argument.
- Add coverage that workspace paths needing escaping round-trip through
URL query encoding.
## Verification
- `just test -p codex-cli codex_new_thread_url_encodes_workspace_path`
## Summary
I frequently want to be able to paste into the searchable menu -- the
most common use-case here is when specifying an upstream for a
`/review`, where I copy the upstream from an open terminal.
## Why
`codex exec` was forcing headless runs to `approval_policy = "never"`
even when the resolved reviewer was `auto_review`. That prevented
unattended exec workflows from reaching the reviewed MCP write path they
were configured to use.
## What changed
- Keep the existing headless `never` default for ordinary exec runs.
- Re-resolve exec config without that synthetic override when the final
reviewer resolves to `AutoReview`, so configured or requirements-driven
approval policy is preserved.
- Add regression coverage for:
- `auto_review` plus `on-request` from user config
- requirements-driven `AutoReview`, asserting exec’s final approval
policy matches the no-override control config exactly
## Validation
- `just fmt`
- `cargo test -p codex-exec`
## TL;DR
When you press Esc or Ctrl+C after sending a prompt but before any
output was rendering, it restores the last composer and the message.
## Summary
Cancelling a prompt immediately after submission should behave like
returning to edit that prompt, not like discarding the user's draft.
Today, pressing `Esc` or `Ctrl+C` before Codex responds leaves the
submitted prompt in the transcript and returns an empty composer,
forcing the user to recall or retype it.
When an interrupted turn has not produced substantive visible output,
restore its submitted prompt directly into the composer and roll back
that latest turn. This also covers the first prompt in a fresh thread,
before the TUI has retained a local user-history cell. The restored
draft keeps its text, image attachments, and active collaboration mode
so it can be edited and resubmitted in place.
Restoration is intentionally suppressed once the turn has produced
user-visible activity such as assistant output, tool work, hooks, or
patches. A transient thinking status does not make the prompt
ineligible. Rollback also rebuilds terminal scrollback from the retained
transcript cells so repeated cancellations and terminal resizes do not
duplicate history.
## How to Test
1. Start the TUI with `cargo run -p codex-cli --bin codex`.
2. In a fresh thread, submit the first prompt and press `Esc` before
Codex emits substantive output. Confirm that the prompt returns to the
composer for editing and its submitted transcript row is removed.
3. Repeat with `Ctrl+C`, then repeat after at least one completed turn.
Confirm the same behavior.
4. Submit a prompt, wait for assistant output or tool activity, then
cancel. Confirm that the transcript remains intact and the prompt is not
restored into the composer.
5. Cancel several output-free prompts and resize the terminal between
attempts. Confirm that the startup banner, tip, and transcript history
do not duplicate in scrollback.
Targeted tests:
- `just test -p codex-tui cancelled_turn_edit_restores_prompt`
- `just test -p codex-tui
output_free_interrupted_turn_requests_prompt_restore`
- `just test -p codex-tui
visible_output_prevents_cancelled_turn_prompt_restore`
- `just test -p codex-tui
thinking_status_keeps_cancelled_turn_prompt_restore_eligible`
- `just test -p codex-tui
patch_activity_prevents_cancelled_turn_prompt_restore`
The full `just test -p codex-tui` run completed with `2746` passing
tests and two unrelated existing guardian feature-flag failures. `just
argument-comment-lint` remains blocked locally by the existing Bazel
LLVM `compiler-rt` sanitizer-header glob failure; the touched Rust diff
was manually audited for positional literal comments.
## Summary
- initialize `parent_thread_id` in the compressed rollout test fixture's
`SessionMeta`
- restore rollout test compilation across Bazel test, clippy,
release-build, and argument-comment-lint jobs
## Root cause
PR #25087 (`Read compressed rollouts and materialize before append`)
added `codex-rs/rollout/src/compression_tests.rs` in merge commit
`a8a6071279b6f3112fcc5fc3fee69c48473d7149`. Its `write_rollout` fixture
constructs `SessionMeta` without the required `parent_thread_id` field,
causing `error[E0063]` when Bazel compiles `rollout-unit-tests-bin` on
`main` and downstream PRs.
## Validation
- `UV_CACHE_DIR=/private/tmp/codex-uv-cache just fmt`
- `just test -p codex-rollout` (`59` tests passed; bench smoke passed)
- `git diff --check`
- manually audited the touched Rust diff for positional literal argument
comments; the change adds no positional callsite
## Local lint blocker
- `just argument-comment-lint` could not reach source inspection locally
because Bazel's LLVM dependency fails analysis:
`compiler-rt/BUILD.bazel` glob `include/sanitizer/*.h` matched no files.
## Why
Local rollout compression needs a cold `.jsonl.zst` representation
without letting compressed physical paths leak into append-mode writers.
The unsafe case is resume or metadata update code successfully reading a
compressed rollout and then appending raw JSONL bytes to the zstd file.
This PR folds the former #25088 materialization slice into the
read-support PR so the reader changes and append-safety invariant land
together.
## What Changed
- Teach rollout readers, discovery, listing, search, and ID lookup to
understand compressed `.jsonl.zst` rollouts.
- Keep `.jsonl` as the logical/stored rollout path while allowing read
paths to open either plain or compressed storage.
- Materialize compressed rollouts back to plain `.jsonl` before
append-mode writes, including resume and direct metadata append paths.
- Preserve compressed-file permissions when materializing back to plain
JSONL.
- Refresh thread-store resolved rollout paths after compatibility
metadata writes so reconciliation follows the materialized file.
- Avoid treating transient compression temp files as real rollout lookup
results.
## Remaining Stack
#25089 remains the separate worker PR. It is based directly on this PR
and stays behind the disabled `local_thread_store_compression` feature
flag.
The worker still has a broader coordination question: a resume or
metadata update can race with background compression while a plain file
is being replaced by `.jsonl.zst`. This PR handles the read and
materialize-before-append primitives; it does not make the worker
production-ready.
## Validation
- `just test -p codex-rollout`
- `just test -p codex-thread-store`
- `just fix -p codex-rollout`
- `just fix -p codex-thread-store`
- `just bazel-lock-check`
## Summary
- add an extension-owned `GoalApi` for thread goal get/set/clear
operations
- register live goal runtimes with the API from the goal extension
backend
- cover the API and runtime-effect paths in goal extension tests
## Stack
Follow-up app-server wiring PR: #25108
## Validation
- `just fmt`
- `just fix -p codex-goal-extension`
- `just test -p codex-goal-extension`
## Why
`try_start_turn_if_idle` is the core helper for starting injected input
only when the session is actually idle. It should stay focused on
generic turn-lifecycle safety. The previous `ModeKind::Plan` guard mixed
caller policy into that helper: Plan mode may choose not to auto-start
some extension work, but that decision belongs at the extension or
caller boundary rather than in the session injection primitive.
## What changed
- Removed the `ModeKind::Plan` early return from
`Session::try_start_turn_if_idle`.
- Removed the now-unused `ModeKind` import from
`core/src/session/inject.rs`.
## Testing
Not run locally.
## Why
Goal steering prompts have grown into long inline Rust strings, which
makes the authored prompt text hard to review and easy to damage while
changing the surrounding plumbing. Moving those prompts into embedded
Markdown templates keeps the policy text in the shape reviewers actually
read, while preserving the existing runtime substitution and objective
escaping behavior.
## What changed
- Added `ext/goal/templates/goals/continuation.md`, `budget_limit.md`,
and `objective_updated.md` for the three goal steering prompts.
- Updated `ext/goal/src/steering.rs` to parse those embedded templates
once with `codex-utils-template` and render the existing goal values
into them.
- Kept user objectives XML-escaped before rendering and converted budget
counters into template variables.
- Added the template directory to `ext/goal/BUILD.bazel` `compile_data`
so Bazel has the same embedded prompt inputs as Cargo.
## Testing
- Not run locally.
## Why
The goal extension needs a way to resume an active goal after the thread
becomes idle, but the old core goal runtime should not be refactored as
part of this step. The missing piece is a small core-owned turn-start
primitive: let an extension ask for a normal model turn only when the
thread is idle, and otherwise fail without injecting into whatever is
currently active.
## What Changed
- Adds `CodexThread::try_start_turn_if_idle(...)` as the narrow
extension-facing primitive for synthetic idle work.
- Implements the session side so it refuses to start when:
- the provided input is empty,
- the session is in plan mode,
- a turn is already active, or
- trigger-turn mailbox work is pending.
- Gives trigger-turn mailbox work priority if it appears while the idle
turn is being prepared.
- Wires `GoalExtension::on_thread_idle` to read the active persisted
goal and submit the continuation prompt through this idle-only
primitive.
- Keeps the legacy core goal continuation implementation in place
instead of folding it into this PR.
## Behavior
This is intentionally best-effort. If `try_start_turn_if_idle` observes
that the thread is not idle, or that higher-priority mailbox work should
run first, it returns the input to the caller. The goal extension drops
that continuation prompt and waits for a future idle opportunity instead
of injecting stale synthetic goal text into an active turn.
## Validation
- `just test -p codex-core
try_start_turn_if_idle_rejects_active_turn_without_injecting`
- `just test -p codex-goal-extension`
## Summary
- default multi-agent v2 to direct-model-only tools so code mode does
not wrap subagent tools
- add default root/subagent team prompts aligned with dogfood training
assumptions
- tighten spawn-agent model override wording to prefer the inherited
model by default
## Tests
- just fmt
- just test -p codex-core
spawn_agent_description_lists_visible_models_and_reasoning_efforts
- just test -p codex-core
multi_agent_v2_default_session_thread_cap_counts_root
- just test -p codex-rollout-trace
- just fix -p codex-core
- just fix -p codex-rollout-trace
Note: a broad just test -p codex-core run was attempted locally, but
this sandbox produced unrelated environment failures around
sandbox-exec, missing test_stdio_server, and realtime timeouts.
## Why
This PR
https://github.com/openai/codex/pull/24161#discussion_r3325692763
revealed a subagent data modeling issue, where we overloaded
`forked_from_id` to also mean `parent_thread_id`. That's incorrect since
guardian and review subagents can be a subagent and NOT fork the main
thread's history.
The solution here is to explicitly store a new `parent_thread_id` on
`SessionMeta`, alongside `forked_from_id` which already exists. While
we're at it, also expose it in the app-server protocol on the `Thread`
object.
A thread->subagent relationship and a fork of thread history are
orthogonal concepts.
## What Changed
- Added top-level `parent_thread_id` persistence on `SessionMeta` and
runtime/session plumbing through `SessionConfiguredEvent`,
`CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
`TurnContext`, and `ModelClient`.
- Made turn metadata, request headers, analytics, and subagent-start
events read the separate runtime/top-level parent field instead of
deriving general parent lineage from `SessionSource` or
`forked_from_thread_id`.
- Passed parent lineage separately at delegated subagent, review,
guardian, agent-job, and multi-agent spawn construction sites;
copied-history fork lineage remains derived only from `InitialHistory`.
- Persisted and exposed parent lineage through rollout/thread-store
projections and app-server v2 `Thread.parentThreadId`.
- Updated app-server README text and regenerated app-server schema
fixtures for the additive `parentThreadId` response field.
## Summary
PR 3 of 5 in the cloud-managed config client stack.
Adds enterprise-managed cloud config as a first-class config layer
source. The layer metadata is preserved through config loading,
diagnostics, debug output, hook attribution, and app-server protocol
surfaces.
## Details
- Enterprise-managed config becomes a normal config layer source with
backend-supplied `id` and display `name` attached for provenance.
- These layers are designed to behave like non-file managed config: they
can surface syntax/type diagnostics by layer name even though there is
no physical config file.
- Relative path settings are resolved from a stored config base so
cloud-delivered config remains consistent with existing MDM-delivered
config semantics.
- Hook attribution distinguishes config-delivered hooks from
requirements-delivered hooks via `HookSource::CloudManagedConfig`.
- This remains pull-based and snapshot-oriented; the PR adds layer
identity/diagnostics, not dynamic reload behavior.
## Validation
Validated through the targeted stack checks after rebasing onto current
`main`:
- Rust crate tests for
config/hooks/cloud-config/backend-client/app-server-protocol
- Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
tests
- Python generated-file contract test
- `cargo shear --deny-warnings`
- Targeted `argument-comment-lint` for config/hooks
## Summary
PR 2 of 5 in the cloud-managed config client stack.
Adds a shared requirements-layer composition engine. The composer
defines how ordered requirements layers combine, with focused tests for
the merge semantics and provenance behavior. The final PR in the stack
wires runtime requirements sources into this path.
## Details
- Mental model: requirements layers are ordered lowest priority first,
matching `ConfigLayerStack`; lower-priority layers provide defaults
while higher-priority layers win scalar/list conflicts.
- Regular fields use config-style TOML merging, including recursive
table merging, so requirements layering follows the same broad model as
`config.toml` layering.
- Domain-specific fields keep explicit semantics: `rules.prefix_rules`
and hooks preserve high-priority-first output, hooks fail closed on
active managed-dir conflicts, and `permissions.filesystem.deny_read`
dedupes as a stable high-priority-first union.
- `remote_sandbox_config` is evaluated within each layer before the
regular TOML merge, so host-specific sandbox constraints do not leak
across layers.
- Provenance points at the exact source when one layer owns a value and
uses composite provenance when a table field is assembled from multiple
layers.
## Validation
Local validation:
- `just fmt`
- `cargo check -p codex-config`
- `just test -p codex-config requirements_composition`
- `git diff --check`
CI will run the broader test matrix.
## Why
We want a manual mode that produces the full packaged unsigned macOS
Codex archive, including bundled resources like `rg`, without mixing
those archives into the signing and publishing flow.
The existing `build_unsigned` mode is the handoff used by external
signing and `promote_signed`, so archive-only inspection and local
packaging should live in a separate mode and artifact namespace.
## What Changed
- added `build_unsigned_archive` as a new manual `release_mode`
- kept the existing `build` matrix running for that mode instead of
introducing a separate archive-only job
- wrote unsigned macOS package archives to
`codex-rs/unsigned-archive-dist/...` instead of the normal `dist/...`
tree
- uploaded those packaged macOS outputs as dedicated
`*-unsigned-archive` workflow artifacts
- kept `build_unsigned` and `promote_signed` on their existing raw
unsigned binary path
## Validation
- parsed `.github/workflows/rust-release.yml` with `ruby -e 'require
"yaml"; YAML.load_file(".github/workflows/rust-release.yml")'`
- ran `git diff --check -- .github/workflows/rust-release.yml`
- reviewed the workflow diff to confirm `build_unsigned_archive` now
reuses the existing `build` job while isolating the unsigned macOS
package archives under dedicated artifact names
- locally verified the package builder layout against unsigned macOS
binaries to confirm the packaged archive contains `bin/codex`,
`codex-path/rg`, and `codex-resources/zsh/bin/zsh`
## Summary
PR 1 of 5 in the cloud-managed config client stack.
Adds the generated backend models and client transport surface for the
config bundle endpoint. This bundle endpoint is the replacement backend
surface for legacy cloud requirements; the final PR in the stack
switches runtime consumers over to it.
## Details
- This is transport-only plumbing: no runtime config behavior changes in
this PR.
- The bundle endpoint is the new shared backend surface for
cloud-delivered config and requirements data.
- Both supported path styles are wired here: `/api/codex/config/bundle`
and `/wham/config/bundle`.
- The response types come from generated backend models so later PRs
consume the backend contract directly instead of maintaining
hand-written mirror structs.
## Validation
Validated through the targeted stack checks after rebasing onto current
`main`:
- Rust crate tests for
config/hooks/cloud-config/backend-client/app-server-protocol
- Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
tests
- Python generated-file contract test
- `cargo shear --deny-warnings`
- Targeted `argument-comment-lint` for config/hooks
## Why
Closes#25006.
`tui.keymap` currently rejects `F13` even though Codex's terminal event
layer can report higher function keys. This prevents users from using
common remappings such as Caps Lock to `F13`.
## What Changed
- Define a shared portable upper bound of `F24` for stored TUI
keybindings.
- Accept `f13` through `f24` in config normalization and runtime
parsing.
- Allow `/keymap` capture to persist `F13` through `F24`.
- Update the unsupported-function-key error and add boundary tests for
`F13`, `F24`, and `F25`.
## How to Test
1. Add a binding such as:
```toml
[tui.keymap.global]
open_transcript = "f13"
```
2. Start Codex and press the remapped `F13` key.
3. Confirm Codex loads the config without the previous `F1 through F12`
error and the action runs.
4. Open `/keymap`, capture `F13` for an action, and confirm the saved
binding is `f13`.
5. As a regression check, try to capture `F25` and confirm Codex reports
that only `F1` through `F24` can be stored.
Targeted tests:
- `just test -p codex-config`
- `just test -p codex-tui function_keys`
Full `just test -p codex-tui` completed with 2,752 passing tests, 4
skipped tests, and two unrelated guardian feature-flag failures:
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
## Summary
- Use normal directory loading for plugin install app metadata so
install avoids forced directory refresh while still loading metadata on
cold cache.
- Continue force-refreshing codex_apps tools for auth state.
- Add regression coverage that pre-warms the directory cache and asserts
install returns cached app metadata without extra directory requests.
## Validation
- just fmt
- git diff --check
- just test -p codex-app-server plugin_install_returns_apps_needing_auth
plugin_install_filters_disallowed_apps_needing_auth (blocked locally:
cargo-nextest is not installed)
## Description
Bedrock currently only supports the implicit `default` service tier for
GPT models. This PR strips non-default service tier metadata from
Bedrock model catalogs so Codex does not advertise or send unsupported
tiers.
## What changed
- Normalize both built-in and configured Bedrock catalogs to
default-only service tier behavior.
- Add regression coverage for built-in and configured Bedrock catalogs.
## Validation
- `just fmt`
- `just test -p codex-model-provider`
## Summary
- rename the multi-agent v2 follow-up task tool surface to assign_task
- update core tests and spec-plan expectations
- keep rollout-trace classification backward-compatible with legacy
followup_task
## Tests
- just fmt
- just test -p codex-core
multi_agents_spec::tests::assign_task_tool_requires_message_and_has_no_output_schema
- just test -p codex-rollout-trace
- just fix -p codex-core
- just fix -p codex-rollout-trace
Note: a broad just test -p codex-core run was attempted locally, but
this sandbox produced unrelated environment failures around
sandbox-exec, missing test_stdio_server, and realtime timeouts.
## Problem
Saved threads can already be archived through app-server RPCs, but the
command line did not expose direct archive or unarchive commands.
## Solution
Add `codex archive <thread>` and `codex unarchive <thread>`, resolving
UUIDs or exact thread names before calling the existing `thread/archive`
and `thread/unarchive` RPCs. The commands support scoped remote flags so
callers can target remote app-server endpoints when archiving or
unarchiving threads.
This also fixes a long-standing bug in `codex resume <thread id>` and
`codex fork <thread id>` that I found when testing the new commands.
These operations shouldn't be allowed on archived sessions. They now
fail with an error that tells the user to run `codex unarchive <thread
id>` first.
## Verification
Added app-server coverage for rejecting archived thread resume by id and
checking that the error includes the matching `codex unarchive <thread
id>` command.
## Why
Users following the Amazon Bedrock API-key setup can export
`AWS_BEARER_TOKEN_BEDROCK` and `AWS_REGION`, but Codex's bearer-token
auth path only accepted `model_providers.amazon-bedrock.aws.region`.
That made the documented env-based setup fail with a missing-region
error even though the standard AWS region environment variable was
present.
## What Changed
- Updates Bedrock bearer-token region resolution to use
`model_providers.amazon-bedrock.aws.region` first, then fall back to
`AWS_REGION`, then `AWS_DEFAULT_REGION`.
- Updates the missing-region error to list all supported region sources.
- Adds focused coverage for config precedence, `AWS_REGION`,
`AWS_DEFAULT_REGION`, and the missing-region failure.
## Summary
- Use the session-loaded plugin app IDs as the source of connector
suggestion candidates.
- Remove the redundant plugin reload from
`tool_suggest_connector_ids()`.
- Add regression coverage for connectors declared by a loaded remote
plugin, using the Databricks app case.
## Context
Loaded remote plugins can declare app connector IDs in `.app.json`. The
session-owned `PluginsManager` already loads those plugins and exposes
their effective app IDs.
The connector suggestion path was creating a separate `PluginsManager`
and recomputing plugin app IDs. That new manager does not share the
session manager’s remote installed plugin cache, so app IDs from loaded
remote plugins were missing from connector suggestions.
## Fix
Pass the already-loaded effective app IDs into connector suggestion
generation and use them directly as the plugin-derived connector
candidate set.
Connector candidates are now built from:
- App IDs declared by loaded plugins
- Explicitly configured connector discoverables
- Existing disabled-suggestion filtering
This avoids a second plugin-manager lookup and keeps connector
suggestions aligned with the plugins actually loaded for the turn.
## Behavior
For example, when a plugin is loaded and its `.app.json` declares data
apps, `list_available_plugins_to_install` can now return those data
connectors.
This does not create plugin suggestions from the plugin itself. Plugin
suggestions still come from eligible uninstalled entries in the
marketplace catalog and require existing matching/filtering rules.
## Validation
- `just fmt`
- Added regression coverage for a loaded-plugin connector ID appearing
in discoverable tools
- Attempted `just test -p codex-core`; the command exited unsuccessfully
in the local test environment without useful failure detail captured in
the run output
# Why
Managed requirements can already constrain sandbox policy choices, but
Windows sandbox implementation selection was still resolved
independently from those requirements. That left the TUI able to
continue through the unelevated fallback even when an organization wants
to require the elevated Windows sandbox implementation.
# What
- Add `[windows].allowed_sandbox_implementations` requirements support
for the Windows `elevated` and `unelevated` implementations.
- Apply that allowlist during core config resolution so disallowed
configured or feature-selected Windows sandbox implementations fall back
to an allowed implementation with the existing requirements warning
path.
- Reuse the existing TUI Windows setup prompts to block disallowed
unelevated continuation, keep required elevated setup in front of the
user, and refuse to persist a TUI-selected Windows sandbox mode that
requirements disallow.
# Semantics
| Allowed | Selected | Effective |
| --- | --- | --- |
| `["elevated"]` | `unelevated` / unset | `elevated` |
| `["unelevated"]` | `elevated` / unset | `unelevated` |
| `["elevated", "unelevated"]` | `elevated` | `elevated` |
| `["elevated", "unelevated"]` | `unelevated` | `unelevated` |
| `["elevated", "unelevated"]` | unset | `elevated` |
Availability is handled by interactive setup surfaces after allowlist
resolution. If the effective elevated implementation is not ready,
elevated-only requirements block on setup. When unelevated is also
allowed, the UI may offer the existing unelevated fallback.
## TUI Screens
If elevated setup is not already complete:
```
Your organization requires the default Codex agent sandbox to continue. Set it up to protect your files and control
network access.
Learn more <https://developers.openai.com/codex/windows>
› 1. Set up default sandbox (requires Administrator permissions)
2. Quit
```
If admin setup fails under `["elevated"]`:
```
Couldn't set up your sandbox with Administrator permissions
Your organization requires the default sandbox before Codex can continue.
Learn more <https://developers.openai.com/codex/windows>
› 1. Try setting up admin sandbox again
2. Quit
```
# Next Steps
- extend the requirements/readout surface, such as
`configRequirements/read`, so clients can inspect the loaded
`[windows].allowed_sandbox_implementations` requirement instead of
inferring it from Windows setup state
- consider extending `windowsSandbox/readiness` as well
- update the App startup guide, setup flow, and banner surfaces so an
elevated-only requirement omits any continue-unelevated escape hatch and
blocks startup until a permitted implementation is ready;
- preserve the existing unelevated fallback path when requirements allow
it, including the `["unelevated"]` case where elevated is disallowed
## Summary
- Keep the original `TOOL_SUGGEST_DISCOVERABLE_PLUGIN_ALLOWLIST` as a
fallback seed list, so users with no installed plugins still get initial
install suggestions.
- Allow additional install suggestions from trusted marketplaces:
`openai-curated` and `openai-bundled`.
- Require non-fallback, non-configured marketplace candidates to share
`.app.json` connector IDs with already installed plugins.
- Preserve explicit configured plugin discoverables as an override,
while still omitting installed, disabled, and `NOT_AVAILABLE` plugins.
## Context
`list_available_plugins_to_install` controls which plugins the model can
trigger via `request_plugin_install`. We want a small starter set for
empty/new users, but we also want installed workflow plugins to unlock
relevant source plugins without maintaining every source plugin ID by
hand.
This keeps the legacy plugin ID allowlist only as the starter fallback.
For everything else, the trusted marketplace is the candidate boundary,
and installed app connector overlap is the relevance filter. For
example, an installed Sales plugin can make HubSpot and Granola
suggestible when those source plugins are in `openai-curated` and share
Sales app connector IDs, while an unrelated test-source plugin with an
app connector not declared by Sales stays hidden.
## Test Coverage
- Empty/no-installed-plugin case: returns the fallback seed plugins from
the original allowlist.
- Installed-app expansion: returns non-fallback marketplace plugins only
when their app connector IDs overlap with an installed plugin.
- Sales workflow case: installed Sales declares HubSpot and Granola
apps, so `hubspot@openai-curated` and `granola@openai-curated` are
returned.
- Sales negative case: `test-source@openai-curated` has an app connector
not declared by Sales, so it is not returned.
- Existing guardrails: installed plugins, disabled suggestions, and
`NOT_AVAILABLE` plugins remain omitted; explicit configured
discoverables still work as an override.
## Validation
- `just fmt`
- `just test -p codex-core plugins::discoverable::tests`
- `just test -p codex-core` was attempted earlier, but current `main` /
local env failed with unrelated existing failures around missing
`test_stdio_server`, CLI/code-mode MCP tool setup, and
unified_exec/shell snapshot flakes/timeouts. The touched discoverable
tests pass.
## Summary
- add Vim normal-mode `s` support to substitute the character under the
cursor and enter insert mode
- fix Vim normal-mode `o` so opening below the final line moves the
cursor onto the new blank line
- update keymap config/schema and keymap picker snapshots for the new
action
## Validation
- `just fmt`
- `just write-config-schema`
- `just test -p codex-config`
- focused `just test -p codex-tui` coverage for the Vim `s` and `o`
behavior, keymap conflict handling, and keymap picker snapshots
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
- `git diff --check`
## Notes
A full `just test -p codex-tui` run still has two unrelated Guardian
feature-flag failures in this checkout:
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
## Summary
- preserve macOS `__CF_USER_TEXT_ENCODING` when launching the sandboxed
fs helper
- keep the fs-helper env narrow; this adds only the CoreFoundation
startup var instead of copying the broader MCP stdio baseline
- add focused coverage that the helper keeps that var without admitting
`HOME`
## Diagnosis
The sandboxed fs helper is not launched like a normal child process.
Exec-server rebuilds its environment from an allowlist, then calls
`env_clear()` before re-execing Codex with `--codex-run-as-fs-helper`.
That helper dispatches before the normal Codex startup path and only
needs to boot a small Tokio runtime, read one JSON request from stdin,
perform the direct filesystem operation, and write one JSON response.
The reported macOS hang sampled the helper before Rust main, in
CoreFoundation initialization while resolving the default text encoding:
`_CFStringGetUserDefaultEncoding -> getpwuid_r -> notify_register_check
-> bootstrap_look_up3 -> mach_msg2_trap`. The fs-helper allowlist kept
`PATH` and temp vars for runtime needs, but it dropped macOS
`__CF_USER_TEXT_ENCODING`. Other Codex subprocess launchers that
intentionally build a minimal Unix baseline, such as MCP stdio, already
preserve that variable.
My read is that stripping `__CF_USER_TEXT_ENCODING` forced this internal
helper down CoreFoundation's fallback user-lookup path, and that lookup
intermittently wedged on the affected machine before the helper could
read stdin or touch the target file. Preserving only this macOS startup
variable avoids that fallback without broadening the fs-helper
environment to shell-like vars such as `HOME`, `USER`, locale settings,
terminal settings, or proxy credentials.
Internal Slack thread omitted from the public PR body.
## Validation
- `cd codex-rs && just fmt`
- `git diff --check`
## Summary
This adds `environment: issue-triage` to the Codex-calling issue
workflow jobs so they can read the GitHub Environment Secret while
staying on GitHub-hosted runners for public issue-triggered workflows.
## Why
The standalone `/v1/alpha/search` request now requires a `model`, but
the `web.run` extension currently omits it.
Adds `model` to extension `ToolCall` invocation.
Follow-up to #23823.
## What changed
- Make `SearchRequest.model` required.
- Expose the effective per-turn model on extension tool calls and pass
it in standalone web-search requests.
- Assert the model is forwarded in the app-server round-trip test.
## Testing
- `just test -p codex-api -p codex-tools -p codex-web-search-extension
-p codex-memories-extension -p codex-goal-extension`
- `just test -p codex-core -E
'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
- `just test -p codex-app-server -E
'test(standalone_web_search_round_trips_encrypted_output)'`
## Why
`SandboxPolicy` is the legacy compatibility shape, but
`codex-thread-store` still exposed it through `StoredThread`,
`ThreadMetadataPatch`, and live metadata sync. That kept thread-store
consumers tied to the legacy representation and meant richer permission
profile data could not round-trip through thread metadata or cold
rollout reconciliation.
## What Changed
- Replaced thread-store `sandbox_policy` API fields with canonical
`PermissionProfile` fields.
- Persist new permission-profile metadata as canonical JSON in the
existing SQLite metadata slot while continuing to read older legacy
sandbox policy values.
- Updated local, in-memory, live metadata sync, and rollout extraction
paths to propagate `TurnContextItem::permission_profile()`.
- Re-materialize legacy permission metadata against the final rollout
cwd when rollout-derived metadata replaces stale SQLite summaries.
- Updated affected app-server and core test constructors to build
`PermissionProfile` values directly.
## Test Plan
- `cargo test -p codex-state`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server
summary_from_stored_thread_preserves_millisecond_precision --lib`
- `cargo test -p codex-core realtime_context --lib`
## Summary
Introduce a `CodeModeSession` interface for executing and managing
code-mode cells.
This moves cell lifecycle, callback delegation, termination, and
shutdown behind a session abstraction, while continuing to use the
existing in-process implementation, and the ability to implement an
external process one behind this interface.
A Codex session owns one `CodeModeSession`, which in turn owns its
running cells and stored code-mode state. Each cell is represented to
the caller as a `StartedCell`, exposing its cell ID and initial
response.
It also introduces a `CodeModeSessionDelegate` callback interface. A
session uses the delegate to invoke nested host tools and emit
notifications while a cell is running, allowing the runtime to
communicate with its owning Codex session without depending directly on
core turn handling.
<img width="2121" height="1001" alt="image"
src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31"
/>
## Summary
- terminate sandbox filesystem helpers when the Tokio child handle is
dropped
## Why
A sandbox filesystem helper can stall during process startup before
reading stdin. If the owning async operation is cancelled or torn down,
the spawned helper should not remain running as an orphaned process.
Setting `kill_on_drop(true)` gives the filesystem helper the cleanup
behavior that Tokio child processes otherwise do not enable by default.
This intentionally does not add a timeout. It does not detect or recover
an active hung file edit while the owning future remains alive. A more
precise startup-health mechanism can be handled separately.
## Validation
- `just test -p codex-exec-server` (186 tests passed; benchmark smoke
passed)
- `just fmt`
- `just fix -p codex-exec-server`
- `git diff --check`
## Why
We recently added `forked_from_thread_id` which lets us trace where a
thread's _context_ comes from, but we also want to understand subagent
lineage (e.g. which parent thread spawned this subagent? what kind of
subagent is it?) which is orthogonal.
This PR adds `parent_thread_id` and `subagent_kind` to the
`x-codex-turn-metadata` header sent to ResponsesAPI.
## What changed
- Adds `parent_thread_id` and `subagent_kind` to core-owned
`x-codex-turn-metadata`.
- Restores persisted `SessionSource` and `ThreadSource` from resumed
session metadata so cold-resumed subagent threads keep their lineage on
later Responses API requests.
- Centralizes parent-thread extraction on `SessionSource` /
`SubAgentSource` and reuses it in the Responses client, analytics, agent
control, and state parsing paths.
- Extends reserved-key, git-enrichment, thread-spawn, and app-server v2
metadata coverage for the new lineage fields.
## Verification
- Not run locally per request.
- Added focused coverage in `core/src/turn_metadata_tests.rs` and
`app-server/tests/suite/v2/client_metadata.rs`.
## Why
TUI users can archive saved sessions from other surfaces, but there is
no in-session command for archiving the active session. Since archiving
the active session also exits the TUI, the command should ask for
explicit confirmation instead of firing immediately.
I'm also working on [a companion
PR](https://github.com/openai/codex/pull/25021) that adds `codex
archive` and `codex unarchive` top-level CLI commands.
## What changed
- Adds a new `/archive` slash command described as `archive this session
and exit`.
- Shows a confirmation dialog with `No, don't archive` selected first
and `Yes, archive and exit` as the explicit action.
- On confirmation, calls the existing `thread/archive` app-server RPC
for the active main session and exits after success.
- Keeps `/archive` disabled while a task is running and unavailable in
side conversations.
## Verification
Added focused TUI coverage for the `/archive` confirmation flow,
disabled-while-task-running behavior, and the `/ar` slash-command popup
snapshot.
## Summary
The desktop app now presents the on-request permissions mode as `Ask for
approval` and the manual-review-backed mode as `Approve for me`. The TUI
still exposed older/internal labels like `Default` and `Auto-review`,
which made the same underlying settings look different across clients.
This updates the TUI UX copy to match the app without changing the
underlying default behavior. Fresh threads continue to use the existing
on-request approval mode, now displayed as `Ask for approval`.
The label changes cover `/permissions`, explicit profile permissions
menus, status surfaces, config persistence history/error text, and the
corresponding TUI snapshots.
### Before
<img width="1181" height="119" alt="Screenshot 2026-05-28 at 10 19
47 PM"
src="https://github.com/user-attachments/assets/0664846b-b6dd-4931-b4dd-d0af0d42058e"
/>
<img width="523" height="19" alt="Screenshot 2026-05-28 at 10 21 29 PM"
src="https://github.com/user-attachments/assets/7899c33e-b35d-4684-8389-97e357803423"
/>
### After
<img width="1216" height="117" alt="Screenshot 2026-05-28 at 10 19
32 PM"
src="https://github.com/user-attachments/assets/015aab43-ac97-411f-8031-75cdd887251b"
/>
<img width="567" height="18" alt="Screenshot 2026-05-28 at 10 20 24 PM"
src="https://github.com/user-attachments/assets/28b6422c-b823-4298-b221-c83d46d09d66"
/>
## Why
Some Windows users do not have local admin access, so they cannot
complete the elevated portion of the Windows sandbox setup when Codex
first needs it. This adds an alpha provisioning path that an admin or IT
deployment script can run ahead of time for the Codex user.
The intended managed-deployment shape is:
```powershell
codex sandbox setup --elevated --user "$env:COMPUTERNAME\Alice" --codex-home "C:\Users\Alice\.codex"
```
`--elevated` is treated as the requested sandbox setup level, not as
proof that the process is elevated. The Windows sandbox setup
orchestration still checks that the caller is actually elevated before
launching the helper without a UAC prompt.
## What changed
- Added `codex sandbox setup --elevated` with explicit user selection
via either `--current-user` or `--user ... --codex-home ...`.
- Moved the CLI implementation into `cli/src/sandbox_setup.rs` instead
of growing `cli/src/main.rs`.
- Added a Windows sandbox `ProvisionOnly` helper mode that runs the
elevation-required provisioning work without requiring a workspace cwd
or runtime sandbox policy.
- Reused the existing elevated helper path for creating/updating sandbox
users, configuring firewall/WFP rules, and applying sandbox directory
ACLs.
- Persisted `windows.sandbox = "elevated"` into the target `CODEX_HOME`
so the desktop app does not show the initial sandbox setup banner after
pre-provisioning succeeds.
## Validation
- `cargo fmt -p codex-windows-sandbox -p codex-core -p codex-cli`
- `cargo test -p codex-cli sandbox_setup --target-dir
target\sandbox-setup-check`
- `cargo test -p codex-windows-sandbox
payload_accepts_provision_only_mode --target-dir
target\sandbox-setup-check`
- `git diff --check`
- Manual Windows alpha flow with a standard local user (`Mandi Lavida`):
ran the new setup command from an admin shell, verified the target
`.codex` contents, sandbox marker/secrets, ACLs, firewall rules, and
desktop startup without the sandbox setup banner once experimental
network proxy requirements were disabled.
## Notes
This intentionally does not solve later elevated update coordination for
IT-managed deployments. The setup command can still apply provisioning
updates when run again, but a broader coordination/process story is out
of scope for this alpha.
## Why
The standalone `image_gen.imagegen` extension should behave like native
image generation for artifact persistence and UI completion, while
returning its save-location guidance as part of the tool result instead
of injecting a developer message.
## What Changed
- Added an image-generation completion hook for extension tools so core
can persist generated images and emit the existing `ImageGeneration`
lifecycle events.
- Reused core image artifact persistence for extension output and
removed extension-local save-path/file-writing logic.
- Split shared image persistence from built-in finalization so native
image generation keeps its existing developer-message instruction
behavior.
- Returned the generated image save-location instruction through the
extension `FunctionCallOutput`, alongside the generated image input for
model follow-up.
- Preserved the existing image-generation event shape for current UI and
replay compatibility.
- Avoided cloning the full generated-image base64 payload when emitting
the in-progress image item.
- Removed dependencies no longer needed after moving persistence out of
the extension crate.
## Fast Follow
- Adjust the existing Extension API and add a general `TurnItem`
finalization path for re-usability of code
## Validation
- Ran `just fmt`.
- Ran `just bazel-lock-update`.
- Ran `just bazel-lock-check`.
- Ran `just test -p codex-tools -p codex-extension-api -p
codex-image-generation-extension`.
- Ran `just test -p codex-core
image_generation_publication_is_finalized_by_core`.
- Ran `just test -p codex-core
handle_output_item_done_records_image_save_history_message`.
- Ran `just fix -p codex-tools -p codex-extension-api -p codex-core -p
codex-image-generation-extension`.
Ensures MCP-backed `codex-core` integration tests exercise initialized
servers instead of racing server startup.
I've been idly investigating a few flakes and the failure modes are much
more confusing when a tool call fails because of a failed server start
than when the failed server start causes the test to fail directly.
Adds failure-only logging for MCP streamable HTTP post_message calls and
the underlying reqwest send path, capturing the MCP method/request id,
endpoint shape, auth-header presence, timeout/connect classification,
and sanitized error source chain without logging headers, bodies,
tokens, or full URLs.
## Why
`core/src/config/edit.rs` owns the config edit state machine, but it
also carried the TOML document helper code inline as a nested module.
Moving those helpers into their own file keeps the edit orchestration
easier to scan without changing the config persistence behavior.
## What changed
- Moved the existing `document_helpers` module from
`core/src/config/edit.rs` into
`core/src/config/edit/document_helpers.rs`.
- Added `mod document_helpers;` so the existing `pub(super)` helper API
remains available to the rest of `config::edit`.
## Testing
Not run; this is a refactor-only module extraction with no intended
behavior change.
## Why
Standalone `web.run` calls run in the extension, so they need normal
web-search progress activity while a request is in flight and durable
completed activity after a thread is reloaded.
Follow-up to #23823; uses the extension turn-item emission path added in
#24813.
## What changed
- Emit standalone `web.run` start/completion items through the host
turn-item emitter, preserving standard client delivery and rollout
persistence.
- Include useful completion detail for queries, image queries, and
literal-URL `open`/`find` commands.
- Render completed searches as `Searched the web` or `Searched the web
for <detail>`, with snapshot coverage for the detail-free case.
- Extend the app-server round-trip test to verify completed search
activity is reconstructed by `thread/read` after a fresh-process reload.
## Testing
- `just test -p codex-web-search-extension`
- `just test -p codex-app-server -E
"test(standalone_web_search_round_trips_encrypted_output)"`
## Why
Some models need to select their code-execution behavior through model
catalog metadata. Models without that metadata must continue to follow
the existing `CodeMode` and `CodeModeOnly` feature flags, including when
a newer server sends an enum value this client does not recognize.
## What changed
- add optional `ModelInfo.tool_mode` metadata with `direct`,
`code_mode`, and `code_mode_only`
- treat omitted and unknown wire values as `None`
- resolve `None` from the existing feature flags
- carry the resolved `ToolMode` directly on `TurnContext`, outside
`Config`
- use the resolved value for turn creation, model switches, review
turns, tool planning, and code execution
## Coverage
- add protocol coverage for omitted, known, and unknown enum values
- add focused coverage for flag fallback and explicit metadata
overriding feature flags
- add core integration coverage that fetches remote model metadata
through `/v1/models` and verifies the outbound `/responses` tools for
explicit `direct` and `code_mode_only` selectors
## Stack
- followed by #25032
# Why
Fixes#24529. Completed hook output in the TUI rendered each
`HookOutputEntry` as one ratatui line, so explicit newlines inside hook
output were not shown as separate transcript rows. That made multiline
`SessionStart.additionalContext` hard to inspect even though the
model-facing context path preserved the original text.
# What
- Split completed hook output entries on explicit newlines before
rendering them in `codex-rs/tui/src/history_cell/hook_cell.rs`.
- Keep the hook output prefix, such as `hook context:` or `warning:`, on
the first physical line only.
- Preserve explicit blank lines and render continuation lines with the
hook body indent.
- Add unit coverage for multiline context and warning output, plus a
chatwidget snapshot regression for `SessionStart` history output.
# Testing
- `cargo nextest run -p codex-tui completed_hook_multiline
hook_completed_before_reveal_renders_completed_without_running_flash`
- `just argument-comment-lint -p codex-tui -- --ignore-rust-version
--lib --tests`
## Summary
Remove a stale `TODO(jif)` block of commented-out rollout listing tests
that still referenced an older listing API.
The current rollout listing behavior is covered by the active state DB
and filesystem fallback tests, so keeping the dead commented tests just
adds noise.
## Validation
- `just fmt`
- `just test -p codex-rollout`
## Summary
- handle goal usage-limit turn errors in the goal extension
- exercise the extension path in the goal backend test
## Tests
- just fmt
- just test -p codex-goal-extension
- just fix -p codex-goal-extension
## Summary
- Clarify default, omission, and bounded behavior across built-in tool
schemas, including unified exec, classic shell, Code Mode exec/wait,
multi-agent, agent job, MCP resource, image, goal, plan, tool_search,
and test-sync fields.
- Convert update_plan status to an enum and add short field descriptions
where the schema previously relied on surrounding context.
- Remove the dedicated permission-approval schema test and keep only
updates to existing expected-spec tests.
## Validation
- Ran `just fmt`.
- Ran `git diff --check`.
- Did not run clippy or tests, per request.
Regression has been eval
[here](https://openai.slack.com/archives/C09GDSP1J9X/p1779905065496949)
and we proved there are no regressions
Deletes `codex-rs/debug-client/src/state.rs` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/reader.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/output.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/main.rs` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/commands.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/src/client.rs` as one step in removing
the stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/README.md` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
Deletes `codex-rs/debug-client/Cargo.toml` as one step in removing the
stale app-server debug client.
This intentionally leaves Cargo workspace and lockfile cleanup for a
later follow-up PR.
## Why
This PR is stacked on #24918, which moves goal steering onto
source-labeled internal model context fragments. Active-turn goal
steering should use the same running-turn injection path as other
runtime steering, so those fragments enter the pending input queue as
`ResponseItem`s through the existing
[`Session::inject_if_running`](8d6f6cdf69/codex-rs/core/src/session/inject.rs (L12-L27))
behavior instead of through a goal-specific conversion wrapper.
## What Changed
- Exposes a narrow `CodexThread::inject_if_running` bridge for callers
that only hold a thread handle.
- Changes `ext/goal` active-turn steering to pass `ResponseItem`s
directly.
- Builds goal steering prompts as contextual internal model context
`ResponseItem`s before injecting them into the running turn.
## Testing
Not run locally; PR metadata update only.
## Why
Goal steering is one form of runtime-owned model context, but the old
`<goal_context>` wrapper made the contextual-fragment hiding path
goal-specific. Using a source-labeled internal context fragment gives
core and extensions a shared shape for hidden model steering while
keeping those prompts out of visible turn history.
The change also keeps legacy `<goal_context>` messages recognized as
hidden contextual input so existing stored history does not start
rendering old goal-steering prompts as user-visible turn items.
## What Changed
- Replaces `GoalContext` with `InternalModelContextFragment` plus a
validated `InternalContextSource`.
- Renders goal steering as `<codex_internal_context
source="goal">...</codex_internal_context>`.
- Updates core goal steering and `ext/goal` steering to inject the new
internal-context fragment.
- Updates contextual-fragment, event-mapping, goal, and session tests
for the new wrapper.
## Test Coverage
- Adds coverage for detecting the new internal model context fragment.
- Preserves coverage for hiding legacy `<goal_context>` fragments.
- Verifies invalid internal context sources are rejected and arbitrary
context tags are not hidden.
- Updates goal steering/session assertions to expect the new
`source="goal"` wrapper.
## Summary
`fs/watch` was using a local debounce wrapper whose deadline was
initialized once and then reused after the first batch. Once that stale
deadline was in the past, later file changes could bypass the intended
200ms debounce and send noisier `fs/changed` notifications.
This moves the debounce wrapper into `codex-file-watcher` as
`DebouncedWatchReceiver`, resets the debounce deadline for each event
batch, preserves pending paths across cancelled receives, and updates
app-server `fs/watch` to use the shared wrapper.
Fixes#24692.
## Why
Permission profiles can mark filesystem entries as unreadable with
`deny` rules, including glob patterns. Several shell execution paths
treated known-safe commands or execpolicy `allow` rules as sufficient to
run outside the filesystem sandbox. That is not valid for read-capable
commands: for example, `cat` or `ls` may be reasonable to allow
generally, but dropping the sandbox would also drop deny-read
constraints such as `**/*.env`.
## What changed
- Added a shared check that treats active deny-read restrictions as
incompatible with unsandboxed execution.
- Kept first-attempt execution sandboxed for explicit escalation and
execpolicy allow bypasses when deny-read entries are present.
- Prevented no-sandbox retry after a sandbox denial when the active
filesystem policy contains deny-read entries.
- Updated the zsh-fork execve path so prefix-rule `allow` decisions
continue inside the current sandbox when deny-read restrictions are
active.
## Verification
- `cargo test -p codex-core tools::sandboxing::tests`
- `cargo test -p codex-core
tools::runtimes::shell::unix_escalation::tests`
- `cargo test -p codex-core
shell_command_enforces_glob_deny_read_policy`
## Why
When the TUI resumes a thread, transcript replay renders prior user
messages but did not seed the composer history. That leaves the resumed
session with empty in-memory prompt history, so pressing Up can fall
through to persisted global history and surface a prompt from another
thread.
The expected behavior is that prompts from the resumed thread are
recalled first, with global history only as a fallback.
## What changed
- Record replayed user messages into the composer history during resume
replay.
- Preserve the existing persisted history format and avoid any startup
history scan.
- Add focused TUI coverage showing replayed prompts are recalled before
persisted global history.
## Validation
- Added `replayed_user_messages_seed_composer_history` in
`codex-rs/tui/src/chatwidget/tests/history_replay.rs`.
- `just test -p codex-tui replayed_user_messages_seed_composer_history`
passed.
## Summary
This fixes BUGB-17567 by preventing non-Windows command safety
classification from invoking the Windows PowerShell safelist/parser
path.
Previously, `is_known_safe_command` called the Windows PowerShell
classifier on every platform. That classifier recognizes
`pwsh`/`powershell` by basename and delegates script parsing to the
PowerShell AST parser. The parser starts the supplied executable, so on
macOS/Linux a repository-controlled `pwsh` path could execute during
safety parsing before the normal sandboxed command execution path.
The change gates the Windows PowerShell classifier and module behind
`#[cfg(windows)]`. On macOS/Linux, PowerShell-looking commands are no
longer auto-approved by the Windows classifier and instead fall through
to the normal non-Windows safe-command logic.
## Validation
- `/private/tmp/codex-tools/bin/just fmt`
- `PATH=/private/tmp/codex-tools/bin:$PATH
/private/tmp/codex-tools/bin/just test -p codex-shell-command`
The focused test run passed 135 tests with 0 skipped and completed the
crate bench-smoke step.
## Notes
This PR is scoped to the BUGB-17567 macOS/Linux path. Windows still uses
the PowerShell classifier; a separate hardening follow-up should ensure
Windows safety parsing only executes a trusted PowerShell parser binary
and does not spawn the command's `argv[0]` when that path may be
repository-controlled.
## Why
`codex-backend` now authenticates remote-control server websocket
connections with short-lived server tokens instead of the user's ChatGPT
access token. `app-server` needs to mint and refresh those server tokens
without persisting them, so a restart can reconnect from durable
enrollment identity while keeping the bearer token memory-only.
## What Changed
Updated the remote-control transport to consume `remote_control_token`
and `expires_at` from server enroll responses and added
`/server/refresh` support for persisted enrollments or expiring cached
tokens.
Websocket handshakes now send `Authorization: Bearer
<remote_control_token>` with the existing server identity headers, and
no longer send the ChatGPT bearer token or `chatgpt-account-id` on that
websocket path.
The in-memory enrollment state now owns the ephemeral server token
cache, while SQLite still persists only `server_id`, `environment_id`,
and `server_name`. Websocket `401`/`403` clears only the cached token
for refresh on reconnect; websocket or refresh `404` clears stale
persisted enrollment and re-enrolls. Response body previews redact
`remote_control_token` before surfacing parse errors.
## Verification
- `just test -p codex-app-server-transport`
- Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control
--json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached
`status:"connected"` with
`environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.
# Why
Fixes#23993.
Hook command output schemas are published as the contract for hook
authors and schema-driven tooling. The event-specific output schemas
previously described `hookSpecificOutput.hookEventName` as the global
`HookEventNameWire` enum, so a `pre-tool-use.command.output` schema
would validate mismatched values like `PostToolUse`. That made the
schemas less precise than the intended event-specific contract.
# What
Constrain each hook-specific output schema to the matching literal
`hookEventName` value, mirroring the existing input-schema shape.
Also split `SubagentStartHookSpecificOutputWire` from the session-start
output wire so `subagent-start.command.output.schema.json` can emit
`const: "SubagentStart"` instead of sharing the session-start
definition.
# Verification
- `cargo nextest run -p codex-hooks`
- `just fix -p codex-hooks`
- `just argument-comment-lint -p codex-hooks -- --all-targets`
## Why
The Windows Bazel job on `main` started failing after #24108 because one
Windows-only capture test still passed `cwd.as_path()` to
`run_windows_sandbox_capture`. That helper now expects the explicit
`workspace_roots` slice introduced by #24108, so the Windows test target
no longer compiled.
## What Changed
- Updates `legacy_capture_cancellation_is_not_reported_as_timeout` to
pass `workspace_roots_for(cwd.as_path()).as_slice()`, matching the
adjacent capture test and the new runner signature.
## Verification
- GitHub Actions CI is the important validation for this Windows-only
compile path.
- Created quickly to get Windows CI running while the separate Ubuntu
`compact_resume_fork` timeout is still under investigation.
## Why
#23813 switches the Windows sandbox runner path to `PermissionProfile`,
but it still left one runtime anchor for resolving symbolic
`:workspace_roots` entries. That is not enough once a turn has multiple
effective workspace roots: exact entries and deny globs under
`:workspace_roots` need to be materialized for every runtime root before
the command runner chooses token mode or builds ACL plans.
## What Changed
- Replaces the Windows runner/setup `permission_profile_cwd` plumbing
with `workspace_roots: Vec<AbsolutePathBuf>`.
- Resolves Windows-local `PermissionProfile` data with
`materialize_project_roots_with_workspace_roots(...)` instead of the
single-cwd helper.
- Threads `Config::effective_workspace_roots()` through core execution,
unified exec, TUI setup/read-grant flows, app-server setup, app-server
`command/exec`, and `debug sandbox` on Windows.
- Preserves those workspace roots through the zsh-fork escalation
executor instead of rebuilding them from `sandbox_policy_cwd`.
- Makes `ExecRequest::new(...)` and the remaining
`build_exec_request(...)` helper path take
`windows_sandbox_workspace_roots` explicitly so new call sites cannot
silently fall back to `vec![cwd]`.
- Clarifies the `debug sandbox` non-Windows comment: remaining
cwd-dependent resolution still uses `sandbox_policy_cwd`, while
`:workspace_roots` entries are already materialized from config roots.
- Updates elevated runner IPC `SpawnRequest` to send `workspace_roots`
and bumps the framed IPC protocol version to `3` for the payload shape
change.
- Adds Windows-local resolver coverage for expanding exact and glob
`:workspace_roots` entries across multiple roots, plus core helper
coverage proving explicit roots are preserved.
## Verification
- `cargo check -p codex-windows-sandbox -p codex-core -p codex-tui -p
codex-cli -p codex-app-server`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core windows_sandbox`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-app-server windows_sandbox`
- `cargo test -p codex-tui windows_sandbox`
- `cargo test -p codex-cli debug_sandbox`
- `just test -p codex-core unified_exec`
- `just test -p codex-core
build_exec_request_preserves_windows_workspace_roots`
- `env -u CODEX_NETWORK_PROXY_ACTIVE -u
CODEX_NETWORK_ALLOW_LOCAL_BINDING just test -p codex-app-server --lib
command_exec`
- `just test -p codex-windows-sandbox`
- `just test -p codex-exec sandbox`
- `just fix -p codex-core -p codex-app-server -p codex-windows-sandbox`
A local macOS cross-check with `cargo check --target
x86_64-pc-windows-msvc ...` did not reach crate Rust code because native
dependencies require Windows SDK headers (`windows.h` / `assert.h`) in
this environment; Windows CI remains the real target validation.
Two local targeted filters compile but do not run assertions on macOS:
`env -u CODEX_NETWORK_PROXY_ACTIVE -u CODEX_NETWORK_ALLOW_LOCAL_BINDING
just test -p codex-app-server --lib command_exec_processor` matched zero
tests, and `just test -p codex-linux-sandbox landlock` matched zero
tests because the landlock suite is Linux-only.
## Summary
Some permission profiles can encode filesystem reads that should remain
unavailable to the agent. Before this change, the model-visible context
and automatic approval review prompt summarized the effective
permissions as a legacy sandbox mode, which can omit permission-profile
filesystem entries from escalation decisions.
For example, a profile can grant workspace access while denying a
private subtree across every workspace root:
```toml
default_permissions = "restricted-workspace"
[permissions.restricted-workspace.workspace_roots]
"/Users/alice/project" = true
"/Users/alice/other-project" = true
[permissions.restricted-workspace.filesystem]
":minimal" = "read"
[permissions.restricted-workspace.filesystem.":workspace_roots"]
"." = "write"
"private" = "deny"
"private/**" = "deny"
```
The context window now describes the workspace roots and effective
filesystem side of the `PermissionProfile` directly, with deny entries
marked as non-escalatable:
```xml
<environment_context>
<cwd>/Users/alice/project</cwd>
<shell>zsh</shell>
<filesystem><workspace_roots><root>/Users/alice/project</root><root>/Users/alice/other-project</root></workspace_roots><permission_profile type="managed"><file_system type="restricted"><entry access="read"><special>:minimal</special></entry><entry access="write"><path>/Users/alice/project</path></entry><entry access="write"><path>/Users/alice/other-project</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/project/private</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/other-project/private</path></entry><entry access="deny" escalatable="false"><glob>/Users/alice/project/private/**</glob></entry><entry access="deny" escalatable="false"><glob>/Users/alice/other-project/private/**</glob></entry></file_system></permission_profile></filesystem>
</environment_context>
```
Managed requirements can impose the same kind of deny-read restriction:
```toml
[permissions.filesystem]
deny_read = [
"/Users/alice/project/private",
"/Users/alice/project/private/**",
]
```
The automatic approval review prompt also receives the parent turn's
denied-read context, so review decisions can account for the active
permission profile.
## What Changed
- Render the effective filesystem profile in `<environment_context>`,
including profile type, filesystem entries, workspace roots, and
non-escalatable deny entries.
- Persist effective `workspace_roots` in `TurnContextItem` so
resumed/replayed context does not have to bind `:workspace_roots`
through legacy `cwd` fallback.
- Add explicit permission instructions that denied reads are policy
restrictions, not escalation targets.
- Pass the parent turn's denied-read context into automatic approval
reviews.
- Add targeted coverage for prompt rendering, workspace-root
materialization, replay context, and review prompt context.
- Keep the prompt-context test expectations platform-aware so the same
filesystem rendering assertions pass on Unix and Windows paths.
## Testing
- `just test -p codex-core
context::environment_context::tests::serialize_environment_context_with_full_filesystem_profile`
- `just test -p codex-core
context::environment_context::tests::turn_context_item_filesystem_uses_workspace_roots_instead_of_cwd`
- `just test -p codex-core
context::permissions_instructions::permissions_instructions_tests::builds_permissions_from_profile_with_denied_reads`
- `just fix -p codex-core`
I also attempted `just test -p codex-core`; the changed prompt-context
tests passed, but the full local run did not complete cleanly in this
sandboxed macOS environment due unrelated user-shell `CODEX_SANDBOX*`
expectations and integration-test timeouts.
## Summary
Adds an optional `clientId` field to app-server v2 `UserInput` and
carries it through the core `UserInput` model so clients can correlate
echoed user input items without relying on payload equality.
## Details
- Adds `client_id: Option<String>` to core `UserInput` variants.
- Exposes the v2 app-server field as `clientId` on the wire and in
generated TypeScript.
- Preserves the id when converting between app-server v2 and core
protocol types.
- Regenerates app-server schema fixtures.
## Validation
- `just fmt`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-protocol`
- `git diff --check`
## Why
`codex exec-server` has a local WebSocket listener, but it did not apply
the same browser-origin request handling as the `app-server` WebSocket
transport. Requests that carry an `Origin` header should not be upgraded
by this local transport, keeping both local WebSocket servers consistent
and avoiding unexpected browser-initiated connections.
## What changed
- Added an Axum middleware guard in
`codex-rs/exec-server/src/server/transport.rs` that returns `403
Forbidden` for requests carrying an `Origin` header.
- Added an integration test in `codex-rs/exec-server/tests/websocket.rs`
that covers rejection of an `Origin`-bearing WebSocket handshake.
- Kept ordinary WebSocket clients unchanged: existing no-`Origin`
initialization and process behavior remains covered by the crate tests.
## Validation
- `just test -p codex-exec-server` test phase (`186 passed`; run outside
the parent macOS sandbox so nested sandbox tests can execute)
- `just clippy -p codex-exec-server`
## Why
When Guardian or the sandbox network proxy detects and denies a network
attempt, core cancels the associated execution through `ExecExpiration`.
The Windows sandbox capture path was only forwarding the timeout
component of that expiration state. As a result, a sandboxed Windows
command whose network attempt had already been denied could keep running
until its timeout elapsed rather than terminating promptly in response
to the denial.
This change closes that cancellation-propagation gap for Windows sandbox
execution.
## What changed
- Added `WindowsSandboxCancellationToken` as the cancellation hook
exposed to Windows capture backends.
- Extracted the cancellation token from `ExecExpiration` in core and
passed it to both the direct and elevated Windows sandbox capture paths
alongside the existing timeout.
- Updated direct capture to poll for either process exit, timeout, or
cancellation and to terminate cancelled processes without reporting them
as timed out.
- Updated elevated capture to watch for cancellation and send the
existing `Terminate` IPC frame to the elevated runner. The watcher parks
for 50 ms between checks to bound response latency without a tight busy
wait.
- Added Windows regression coverage for a long-running PowerShell
command: cancellation ends capture before its timeout and does not set
`timed_out`.
- Added a visible skip diagnostic when that PowerShell-dependent
regression test cannot execute, and consolidated the duplicated
expiration-policy branch identified in review.
## Security
This improves enforcement after a denied network attempt has been
attributed to a Windows sandboxed execution: the command no longer
remains alive simply because Windows capture lost the cancellation
signal.
This PR does not claim to make Windows offline mode an airtight
no-network or no-exfiltration boundary. It does not introduce
AppContainer or change how network denial is detected; it makes an
already-detected denial promptly stop the affected sandboxed command.
## Validation
### Commands run
- `just fmt`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core network_denial`
- `cargo clippy -p codex-core -p codex-windows-sandbox --tests --no-deps
-- -D warnings`
- `just argument-comment-lint -p codex-windows-sandbox -p codex-core`
The new capture regression is `cfg(target_os = "windows")`, so Windows
CI is the execution coverage for that test path. The local macOS test
runs validate the host-runnable crate and core network-denial behavior.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
#23756 makes packaged Codex builds include and default to the bundled
zsh fork. The important reason to put that fork's directory at the front
of `PATH` is to keep executable-level escalation working after a command
leaves the original shell and later re-enters zsh through `env`.
The expected chain is:
1. The zsh fork runs the top-level shell command.
2. That command launches another program, such as `python3`, while
inheriting the `EXEC_WRAPPER` environment and the escalation socket fd.
3. That program spawns a shell script whose shebang is `#!/usr/bin/env
zsh` rather than `#!/bin/zsh`, and it does not close the escalation fd.
4. `/usr/bin/env` resolves `zsh` through `PATH`, so it must find the
packaged zsh fork before the system zsh.
5. Commands inside that nested script are intercepted by the zsh fork
and can still request escalation from Codex.
If `PATH` resolves `zsh` to the system shell instead, the nested script
loses zsh-fork exec interception. Commands that should request
escalation can then run only in the original sandbox, or fail there,
without Codex ever receiving the approval request.
Shell snapshots make this slightly more subtle: a snapshot can restore
an older `PATH` after the child shell starts. This PR treats the zsh
fork `PATH` prepend as an explicit environment override so snapshot
wrapping preserves it.
## What Changed
- Added shared zsh-fork runtime helpers that prepend the configured zsh
executable parent directory to `PATH` without duplicate entries.
- Applied the zsh fork `PATH` prepend to both zsh-fork `shell_command`
launches and unified-exec zsh-fork launches before sandbox command
construction.
- Kept the shell-command zsh-fork backend API narrow: it derives the
configured zsh path from session services and rebuilds its sandbox
environment from `req.env`, rather than accepting a second, competing
environment map or a separately threaded bin dir.
- Kept Unix-only zsh-fork `PATH` mutation out of Windows clippy-visible
mutability.
- Added coverage for duplicate `PATH` entries, for preserving the zsh
fork prepend through shell snapshot wrapping, and for the nested
`python3` -> `#!/usr/bin/env zsh` escalation flow.
## Testing
- `just fmt`
- `just fix -p codex-core`
I left final test validation to CI after the latest review-comment
cleanup. Before that cleanup, `just test -p codex-core zsh_fork` passed
locally for the zsh-fork-focused tests.
Fixes#12496.
## Why
Windows sandboxed PowerShell commands can run under
`ConstrainedLanguage` on some machines, especially enterprise-managed
Windows environments. In that mode, our PowerShell command prelude could
fail before every command because it directly assigned
`[Console]::OutputEncoding` to UTF-8. The actual user command still ran,
but Codex surfaced noisy `Cannot set property. Property setting is
supported only on core types in this language mode.` output for every
shell call.
## What Changed
- Makes the PowerShell UTF-8 output encoding prelude best-effort by
wrapping the assignment in `try { ... } catch {}`.
- Keeps the existing UTF-8 behavior when PowerShell allows the
assignment.
- Adds focused tests for adding the prelude and avoiding duplicate
prelude insertion.
## Validation
- `cargo fmt -p codex-shell-command`
- `cargo check -p codex-shell-command`
- `git diff --check`
- Verified a local `ConstrainedLanguage` PowerShell probe prints only
the command output with no property-setting error.
- Verified `codex exec` from a temporary `chcp 437` context reports
`utf-8` / `65001` and preserves non-ASCII output (`café`, `漢字`).
## Why
`/diff` is intended to display working-tree changes, but its Git
invocations honored repository-selected executable helpers. A repository
could configure diff/text conversion helpers, clean/process filters,
`core.fsmonitor`, or `post-index-change` hooks that execute when a user
runs `/diff`.
Fixes
[PSEC-4395](https://linear.app/openai/issue/PSEC-4395/codex-cli-diff-executes-repository-selected-diff-helpers).
## What Changed
- Pass `--no-textconv` and `--no-ext-diff` for tracked and untracked
diff generation.
- Discover configured `filter.<driver>.clean` and `.process` entries,
then neutralize the selected drivers through structured
`GIT_CONFIG_KEY_*` / `GIT_CONFIG_VALUE_*` overrides, including driver
names containing `=`.
- Run all `/diff` Git probes with `core.fsmonitor=false` and a null
`core.hooksPath`.
- Use short submodule reporting while ignoring dirty submodule
worktrees, since inspecting a checked-out submodule for dirtiness can
execute filters from that child repository. This intentionally omits
dirty-only submodule markers in order to preserve the non-executing
security boundary.
- Add real-Git marker tests covering filters, fsmonitor, hooks, and
configured helpers inside checked-out submodules.
## How to Test
1. In a repository with ordinary tracked and untracked edits, run
`/diff`.
2. Confirm the normal working-tree diff is shown for top-level files.
3. Run the targeted tests below; they configure executable marker
helpers for repository filters, fsmonitor, hooks, and a checked-out
submodule, then verify `/diff` does not invoke them.
4. Confirm a dirty-only submodule does not cause Codex to enter the
submodule and execute its configured helper.
Targeted tests:
- `just test -p codex-tui get_git_diff_`
Validation note: `just test -p codex-tui` runs the new coverage, but
this worktree currently also has two unrelated failing guardian tests:
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
and
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`.
## Summary
- Add `--stdio` as a direct alias for `codex app-server --listen
stdio://`.
- Keep `--stdio` and `--listen` mutually exclusive.
- Update the app-server README to document both forms.
The codex-windows runner group should be much faster than the default
GHA runners. Since bazel jobs on windows are frequently the long pole
for PRs checks, this will hopefully get people landing a bit faster.
## Why
Add a standalone image generation path that can be exercised
independently of hosted Responses image generation, while retaining the
hosted tool as fallback unless the extension is actually available to
the model.
## What changed
- Added the `codex-image-generation-extension` crate with standalone
generate/edit execution, prior-image selection for edits, model-visible
image output, and local generated-image persistence.
- Installed the extension in app-server behind the disabled-by-default
`imagegenext` feature and backend eligibility checks.
- Updated core tool planning so eligible `image_gen.imagegen` exposure
replaces hosted `image_generation`, while unavailable configurations
retain hosted fallback.
- Added coverage for extension behavior, edit history reuse, feature
gating, auth eligibility, and hosted-tool replacement.
- The extension is installed through app-server only in this PR; other
execution paths retain hosted image generation because hosted
replacement occurs only when the standalone executor is actually
registered and model-visible.
- The initial extension contract intentionally fixes the image model to
`gpt-image-2` and uses automatic image parameters.
- Native generated-image history/card parity and rollout persistence
cleanup are intentionally deferred follow-up work.
## Validation
- `just test -p codex-image-generation-extension`
- `just test -p codex-features`
- `just test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
- `just test -p codex-app-server`
- `just fix -p codex-image-generation-extension -p codex-features -p
codex-core -p codex-app-server`
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`
---------
Co-authored-by: jif-oai <jif@openai.com>
## Why
#24744 introduced the thread idle lifecycle hook so idle continuation
can be owned by lifecycle contributors instead of hard-coded goal
runtime plumbing. Task completion still called
`goal_runtime_apply(GoalRuntimeEvent::MaybeContinueIfIdle)` directly, so
the post-turn idle transition remained goal-specific and did not notify
generic thread lifecycle contributors.
## What Changed
- Add `Session::emit_thread_idle_lifecycle_if_idle()` to gate idle
emission on both no active turn and no queued trigger-turn mailbox work.
- Call that helper when a task clears the active turn, replacing the
direct `GoalRuntimeEvent::MaybeContinueIfIdle` path.
- Cover the behavior with `codex-core` session tests for emitting after
task completion and suppressing idle emission while trigger-turn mailbox
work is pending.
## Verification
- New tests in `core/src/session/tests.rs` exercise the idle lifecycle
emission and trigger-turn mailbox guard.
This change keeps unified @mentions behind the mentions_v2 gate, moves
the flag to under-development, and polishes mention rendering/history
behavior.
It also adds a few small improvements to the mentions feature around
mention rendering and history round-tripping for plugin/tool mentions in
message edit scenarios. Plugin selections now insert `@` mentions with
better casing, and saved history preserves the visible sigil so recalled
messages look the same as what the user typed.
- Preserves `@` sigils when encoding/decoding mention history for
tool/plugin paths.
- Improves plugin mention insertion so display names/casing are
reflected more cleanly in the composer.
- Update composer to render user-entered plugin mentions in the same
color as the mentions menu. ALso applies to recalled/edited messages.
- Left/right arrows no longer switch unified-mention search modes after
an @mention has already been accepted (Ex: arrowing left through a
composed message that contains @mentions).
- Keeps bound mentions stable around punctuation, so accepted `@`
mentions do not reopen the popup and punctuated `$` mentions still
persist to cross-session history.
**Steps to test**
- Ensure mentions_v2 is enabled through configuration or `--enable
mentions_v2`
- Type `@` in the TUI composer and verify filesystem/plugin/skill
results are displayed in the unified mentions menu.
- Select a plugin mention from the `@` popup and confirm the inserted
text is an `@...` mention with casing, then recall/edit the message and
confirm it still renders as `@...`.
- Mention a skill and verify that skills still insert as `$skill`
mentions rather than `@` mentions.
- Verify punctuated mentions such as `@plugin.` and `($skill)` keep
their bound mention behavior across editing and history recall.
## Summary
Amazon Bedrock should expose GPT-5.5 alongside GPT-5.4, and the Bedrock
GPT entries should stay aligned with the canonical bundled OpenAI model
metadata instead of carrying a separate hand-written copy that can drift
over time. This change will be merged when the model is online.
This change:
- Adds the Bedrock Mantle model id for `openai.gpt-5.5`.
- Builds the Bedrock GPT-5.5 and GPT-5.4 catalog entries from the
bundled OpenAI model catalog, then overrides the Bedrock-facing slug,
explicit priority, and Bedrock-specific context windows.
- Hardcodes both `context_window` and `max_context_window` to `272000`
for Bedrock GPT-5.5 and GPT-5.4.
- Keeps `openai.gpt-5.5` as the default Bedrock model ahead of
`openai.gpt-5.4` and the Bedrock OSS models.
## Why
PR #24813 added extension `TurnItemEmitter` coverage and introduced a
test that records a conversation history item before asserting
extension-emitted turn item events.
`record_conversation_items()` also emits a `RawResponseItem` event to
observers. The test was reading from the same event receiver and
expected the next event to be `ItemStarted`, so the test failed reliably
once the setup history item was present.
## What Changed
Update
`passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call` to
consume and assert the expected setup `RawResponseItem` before checking
the extension `ItemStarted`, `WebSearchBegin`, `ItemCompleted`, and
`WebSearchEnd` events.
This is test-only and does not change extension runtime behavior.
## Verification
- `cargo nextest run --no-fail-fast -p codex-core
tools::handlers::extension_tools::tests::passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call`
## Summary
- Let `close_agent` clean up an agent that is still registered in
`AgentRegistry` even when its underlying thread is already missing.
- Preserve the explicit-close boundary: for known stale thread-spawn
agents, mark the persisted spawn edge `Closed`, then treat
`ThreadNotFound` / `InternalAgentDied` as a successful close so the
registry slot can be released.
- Add a regression for MultiAgentV2 task-name targets where
`close_agent("worker")` succeeds after the worker thread has already
disappeared.
## Motivation
A worker can disappear from `ThreadManager` while its metadata still
exists in the root `AgentRegistry`. Before this change, the close tool
failed while trying to subscribe to the missing thread status, so it
never reached the cleanup path that releases the registered agent slot.
With `agents.max_threads = 1`, an explicit close of that stale task-name
agent could fail and leave the session unable to spawn a replacement.
## Scope
This PR intentionally does not add automatic stale-agent reaping to
`spawn_agent`, `resume_agent`, or `list_agents`. A thread being missing
from `ThreadManager` is not the same as an explicit close: persisted
open spawn edges are still the durable source of truth for resume and
task-name ownership until `close_agent` is called.
## Validation
- `just test -p codex-core -E
'test(multi_agent_v2_close_agent_reaps_stale_task_name_target) |
test(resume_agent_from_rollout_reopens_open_descendants_after_manager_shutdown)'`
- `just fix -p codex-core`
## Summary
The client currently calls `thread/resume` to establish live updates and
immediately follows it with `thread/turns/list` to hydrate recent turns.
This lets `thread/resume` return that page directly, eliminating a round
trip and the ordering/deduplication gap between the two calls.
Experimental clients opt in with `initialTurnsPage: { limit,
sortDirection, itemsView }`. The response returns `initialTurnsPage` as
a `TurnsPage`, including cursors for paging further back in history.
Keeping the controls in a nested opt-in object provides the useful
`thread/turns/list` knobs without spreading page-specific parameters
across `thread/resume`.
## Verification
- `just fmt`
- `just write-app-server-schema --experimental`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server
thread_resume_initial_turns_page_matches_requested_turns_list_page
--tests`
- `cargo test -p codex-app-server
thread_resume_rejoins_running_thread_even_with_override_mismatch
--tests`
- `just fix -p codex-app-server-protocol -p codex-app-server`
## Why
Extension-contributed tools need to emit visible turn items through
Codex's normal event and persistence pipeline.
## What
- Add `TurnItemEmitter` to extension `ToolCall`s and route the core
implementation through `Session::emit_turn_item_*`.
- Hold weak session and turn references so retained tool calls cannot
keep host state alive.
- Provide a no-op emitter for extension test callers.
## Test Plan
- `just test -p codex-core -E
'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
---------
Co-authored-by: jif-oai <jif@openai.com>
It seems that this was added to allow rustc to load proc macros that had
been compiled with UBSan enabled, which zig does for debug and
`ReleaseSafe` builds. When zig drives the link of the final binary it
knows to include the ubsan runtime, but our zig-built artifacts are
being linked into a binary whose linking rustc drives. This removes the
libubsan workaround we have and replaces it with
`-fno-sanitize=undefined` passed to zig.
The new argument is passed at the end of zig's args so should take
precedence over any earlier arguments from the script's caller.
## Why
Goal tools create and update goal state for a persistent thread. The
extension was only checking whether goals were enabled before
advertising those tools, which meant they could be surfaced in contexts
that should not receive thread goal controls: ephemeral threads without
persistent thread state and review subagents.
Those sessions can still run the goal extension lifecycle, but the
thread tools should only be visible when the current thread can safely
use them.
## What changed
- Adds a `GoalRuntimeConfig` that separates goal enablement from whether
goal tools are available for the current thread.
- Computes tool eligibility on thread start from
`persistent_thread_state_available` and `SessionSource`, hiding tools
for review subagents.
- Uses `GoalRuntimeHandle::tools_visible()` when contributing thread
tools so enabled runtime state does not automatically imply tool
exposure.
- Adds backend coverage for hiding goal tools on ephemeral threads and
review subagents.
## Testing
- Added `goal_tools_hidden_for_ephemeral_threads`.
- Added `goal_tools_hidden_for_review_subagents`.
## Summary
- Add a new `app-server-start-bench` crate to measure app-server startup
performance
- Wire the benchmark into the workspace and Bazel build so it can be run
consistently
- Update lockfiles and repo automation to account for the new package
## Summary
- update the bundled `openai-docs` system skill to match the latest
`openai-docs-plus` content from `skills-internal`
- add the cached Codex manual fetch helper and expand the skill routing
for Codex self-knowledge
- keep the stable local skill identity and labels as `openai-docs`
## Why
The built-in OpenAI Docs skill needed to reflect the current upstream
guidance from `skills-internal` while preserving the local system-skill
name used by Codex.
## Impact
Codex now ships the newer OpenAI Docs skill behavior for Codex
self-knowledge and manual-first documentation lookups.
## Validation
- `just test -p codex-skills`
- exact directory diff against transformed `skills-internal`
`origin/main` was clean
Summary
- Add TurnErrorInput and TurnLifecycleContributor::on_turn_error to the
extension API.
- Emit the turn-error lifecycle from core turn error paths, including
usage limit failures.
- Add direct lifecycle coverage for the emitted error facts and stores.
Tests
- just fmt
- git diff --check
- Not run: full tests or clippy (per instructions)
Summary: add session source and persistent-state availability to
ThreadStartInput; populate them from session init; update existing goal
test harness constructors. Tests: just fmt; git diff --check. No full
tests or clippy run per request.
## Summary
- refresh managed ChatGPT auth during auth resolution when its access
token is inside ChatGPT web's five-minute near-expiry window
- cover refresh-window decisions while preserving the existing
expired-token refresh path
## Why
Codex already resolves managed ChatGPT auth before outbound requests and
refreshes expired access tokens there. This change adjusts the existing
predicate to refresh a still-valid access token once it is within the
same five-minute refresh window used by ChatGPT web, avoiding a request
with a token about to expire.
A cross-process serialization follow-up was explored in #24663 and
closed for now; we do not currently suspect cross-process refresh races
are a root cause of the refresh errors under investigation.
External-token, API-key, and Agent Identity auth modes remain unchanged.
## Validation
- `bazel test //codex-rs/login:login-all-test`
- `just fmt` runs Rust formatting successfully, then its Python SDK Ruff
step cannot install `openai-codex-cli-bin==0.131.0a4` on this Linux
environment because no compatible wheel is published.
## Why
Guardian reviews already emit analytics events, but we do not expose
aggregate OpenTelemetry metrics for review volume, latency, token usage,
or terminal outcomes. That makes it harder to monitor Guardian behavior
during rollouts and to compare review outcomes by source, action type,
session kind, model, and failure mode.
## What Changed
- Added Guardian review metric names for count, total duration, time to
first token, and token usage in `codex-rs/otel`.
- Added `core/src/guardian/metrics.rs` to convert
`GuardianReviewAnalyticsResult` into sanitized metric tags covering
decision, terminal status, failure reason, approval request source,
reviewed action, session kind, risk/outcome, model, reasoning effort,
and context/truncation state.
- Emitted the new metrics from `track_guardian_review` for each terminal
Guardian review result.
## Testing
- Added
`guardian_review_metrics_record_counts_durations_and_token_usage`, which
verifies the emitted count, duration, TTFT, token usage histograms, and
tag set through the in-memory metrics exporter.
## Why
Dedicated memories tools are exposed through a Responses API namespace
tool. The namespace itself has to be a valid tool identifier, so
`memories/` can fail validation before the model ever gets a chance to
call the memory tools.
## What changed
- Changed `MEMORY_TOOLS_NAMESPACE` from `memories/` to `memories`.
- Added `memory_tool_namespace_matches_responses_api_identifier` so the
namespace stays non-empty and limited to Responses-safe identifier
characters.
## Verification
- Added unit coverage for the namespace identifier shape in
`codex-rs/ext/memories/src/tests.rs`.
## Summary
- Add the required `/*parent_thread_id*/` argument comment at the
Guardian review session test callsite flagged by CI.
## Validation
- `just fmt`
- Not run: clippy/tests, per request; CI will cover them.
## Why
Guardian review sessions are reusable across forks when their
`GuardianReviewSessionReuseKey` is unchanged, but the underlying
Responses request was still using the child thread ID as
`prompt_cache_key`. That meant forked Guardian reviews that should share
cache context produced different cache keys, reducing prompt cache reuse
and weakening the reuse invariant.
## What Changed
- Adds a `ModelClient` prompt cache key override and uses it for
`ResponsesApiRequest.prompt_cache_key`.
- Computes Guardian review cache keys as
`guardian:<sha1(parent_thread_id:reuse_key)>`, scoped to the parent
thread plus the reuse-sensitive Guardian config.
- Wires session construction to apply that override only for Guardian
sub-agent sessions.
## Testing
- Added coverage that Guardian cache keys are stable for the same
parent/reuse key, change when either the parent thread or reuse key
changes, fit within the Responses API length limit, and are absent for
non-Guardian sessions.
- Extended the parallel review test to assert forked Guardian reviews
send the same `prompt_cache_key`.
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/session/session.rs. Validation was not run per
request; this branch is expected to rely on the companion split PRs.
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/guardian/tests.rs. Validation was not run per request;
this branch is expected to rely on the companion split PRs.
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/guardian/review_session.rs. Validation was not run per
request; this branch is expected to rely on the companion split PRs.
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/guardian/mod.rs. Validation was not run per request;
this branch is expected to rely on the companion split PRs.
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/client.rs. Validation was not run per request; this
branch is expected to rely on the companion split PRs.
## Why
Config loading should not create or write-authorize the memories root
just because memory support exists. Memory startup is the code path that
actually materializes that tree.
## What
- Stop creating the memories root during Config load and remove it from
legacy workspace-write projections.
- Grant the memories root read access only when the memories feature and
use_memories are enabled.
- Create the memories root inside memories startup before seeding
extension instructions.
- Update config and startup tests around the ownership boundary.
## Tests
- just fmt
- just fix -p codex-core
- just fix -p codex-memories-write
- just test -p codex-core
memory_tool_makes_memories_root_readable_without_creating_or_widening_writes
workspace_write_includes_configured_writable_root_once_without_memories_root
permission_profile_override_keeps_memories_root_out_of_legacy_projection
permissions_profiles_allow_direct_write_roots_outside_workspace_root
default_permissions_profile_populates_runtime_sandbox_policy
- just test -p codex-memories-write memories_startup_creates_memory_root
Note: a broader just test -p codex-core run is not clean in this
sandbox; it hit missing test_stdio_server plus seatbelt, realtime, and
environment-sensitive failures. The changed config tests above pass.
## Summary
- Treat `sdk/python` as a development template with source version
`0.0.0-dev`, matching the existing Python runtime packaging pattern.
- Have `python-v*` tags supply the published SDK beta version through
the existing `stage-sdk --sdk-version` path.
- Remove the workflow check requiring a source version bump for each
beta release and remove its now-unused host Python setup step.
- Keep the reviewed runtime dependency pin at
`openai-codex-cli-bin==0.132.0`.
- Remove beta-number-specific documentation so it does not need editing
for each publish.
## Why
The package staging script already writes the release version into the
artifact. Requiring the checked-in SDK template version to match every
tag adds release-only source churn without changing the package users
receive.
## Validation
- Not run locally; relying on online CI for this workflow and metadata
change.
## Release
After this PR lands, publish the next beta by pushing tag
`python-v0.1.0b2` from merged `main`.
## Summary
- Remove the beta warning callout from the PyPI-facing Python SDK
README.
- Keep the existing Beta title and install/usage guidance unchanged.
## Validation
- Not run locally; relying on online CI for this documentation-only
change.
## Release
- Land this change before publishing the next Python SDK beta.
## Summary
- Remove the Python language classifiers from the Python SDK package
metadata.
- Keep `requires-python = ">=3.10"` as the package's interpreter
compatibility constraint.
- Avoid presenting a curated version-support list in PyPI metadata.
## Validation
- Not run locally; relying on online CI for this metadata-only change.
## Release
- Land this change before publishing the next Python SDK beta.
## Summary
- Remove the exact-version install snippet from the PyPI-facing Python
SDK README.
- Remove the release-selection explanation so the install section
presents the standard `pip install openai-codex` path directly.
## Validation
- Not run locally; relying on online CI for this documentation-only
change.
## Summary
- classify known refresh-token terminal failures from `/oauth/token` as
permanent even when the backend returns `400`
- preserve the existing relogin-required message for
`refresh_token_reused` instead of retrying and collapsing into a generic
cloud requirements error
- add regression coverage for `400 refresh_token_reused`
## Testing
- `just fmt`
- `cargo test -p codex-login`
## Why
The initial public `openai-codex` beta should read and install like a
normal published Python package before a release tag is created. This
follows merged PR #24828, which establishes the independent SDK beta
release plumbing and exact runtime dependency.
## What changed
- Rewrote `sdk/python/README.md` as a compact PyPI-facing beta package
page: published installation, one quickstart, short login examples,
built-in help, and links to deeper guides.
- Updated the getting-started guide, API reference, FAQ, and examples
index to present the published beta consistently without repeating
onboarding in the package landing page or reference page.
- Made `pip install openai-codex` the primary install path while beta
releases are the only published SDK releases, with `--pre` documented
for opting into prereleases after a stable release exists.
- Added curated `help()` / `pydoc` docstrings across the public API and
generated public convenience methods through
`scripts/update_sdk_artifacts.py`.
- Declared the repository `Apache-2.0` license expression and
Documentation URL in package metadata, without introducing a duplicated
SDK-local license file.
- Kept the source distribution focused on installable package material
(`src/openai_codex`, `README.md`, and `pyproject.toml`); the repository
docs and runnable examples remain linked from the PyPI README.
- Built release artifacts in an Alpine container on the Ubuntu runner,
matching Python SDK CI and allowing type generation to install the
published `musllinux` runtime wheel.
- Added `twine check --strict` to the release workflow so malformed PyPI
metadata or rendered README content fails before publishing.
- Added focused SDK assertions for beta metadata, the exact runtime pin,
source distribution contents, and the built-in Python documentation
surface.
## Validation
- Ran `uv run --frozen --extra dev ruff check
scripts/update_sdk_artifacts.py src/openai_codex
tests/test_public_api_signatures.py
tests/test_artifact_workflow_and_binaries.py` before the final
README-only reductions and review-fix follow-ups.
- Built `openai_codex-0.1.0b1-py3-none-any.whl` and
`openai_codex-0.1.0b1.tar.gz` before the final README-only reductions
and review-fix follow-ups.
- Ran `python -m twine check --strict` on both built artifacts before
the final README-only reductions and review-fix follow-ups.
- Verified artifact metadata reports `Apache-2.0` without a duplicated
SDK-local license file.
- Verified `inspect.getdoc(...)` resolves documentation for the package,
`Codex`, `CodexConfig`, and key generated thread methods.
- Rebased the documentation/readiness change onto merged PR #24828
without changing the intended SDK or workflow file contents.
- Final verification is delegated to online CI for this PR.
## Why
`openai-codex` needs a beta release lifecycle without requiring beta
releases of its pinned runtime package. Previously, SDK staging rewrote
its runtime dependency to the SDK version, which made an SDK-only beta
impossible.
## What changed
- Set the initial SDK beta version to `0.1.0b1` and pin it to published
stable `openai-codex-cli-bin==0.132.0`.
- Decoupled SDK release staging from runtime versioning so it preserves
the reviewed exact runtime pin.
- Added a `python-v*` tag workflow that builds and publishes only
`openai-codex` through PyPI trusted publishing.
- Removed the Beta classifier from runtime package metadata for future
runtime publications.
- Regenerated protocol-derived SDK models from the selected stable
runtime package.
`0.132.0` is the newest stable runtime admitted by the checked-in
dependency date fence and retains the Linux wheel family currently used
by SDK CI.
## Release setup
Before pushing `python-v0.1.0b1`, configure PyPI trusted publishing for
the `openai-codex` project with workflow `python-sdk-release.yml`,
environment `pypi`, and job `publish-python-sdk`.
## Validation
- `uv run --frozen --extra dev ruff check src/openai_codex scripts
examples tests`
- Parsed `.github/workflows/python-sdk-release.yml` with PyYAML.
- Built staged release artifacts locally:
`openai_codex-0.1.0b1-py3-none-any.whl` and
`openai_codex-0.1.0b1.tar.gz`.
- Verified wheel metadata pins `openai-codex-cli-bin==0.132.0`.
- Tests are deferred to online CI for this PR.
## Why
Dynamic tools are defined at thread start and already stored in rollout
`SessionMeta`, which restores resumed and forked sessions. Persisting
the same tools through SQLite creates a second runtime persistence path
that is unnecessary prework for the explicit namespace refactor.
## What changed
- Restore missing thread-start dynamic tools directly from rollout
history, including when SQLite is enabled.
- Remove SQLite dynamic-tool reads, writes, backfill, and thread
metadata patch plumbing.
- Add SQLite-enabled resume integration coverage that verifies a
rollout-defined dynamic tool is still sent after resume.
## Compatibility
The existing `thread_dynamic_tools` table is intentionally not dropped
even though it's now unused. Older Codex binaries are allowed to open
databases migrated by newer binaries and still reference this table;
dropping it would break that mixed-version path. See
[here](https://github.com/openai/codex/blob/main/codex-rs/state/src/migrations.rs#L10-L11).
## Verification
- `just test -p codex-state -p codex-rollout -p codex-thread-store`
- `just test -p codex-core --test all
resume_restores_dynamic_tools_from_rollout_with_sqlite_enabled`
## Why
`AppServerConfig` is exported as part of the ergonomic Python SDK
surface and passed to `Codex(...)` and `AsyncCodex(...)`. That name
exposes the underlying app-server transport at the same layer where
users are configuring the Codex client. `CodexConfig` makes the common
callsite read naturally and names the object it configures.
## What changed
- Renamed the public configuration dataclass from `AppServerConfig` to
`CodexConfig`.
- Updated `Codex`, `AsyncCodex`, and the transport clients to accept
`CodexConfig`.
- Updated binary-resolution messages, package exports, docs, examples,
and related coverage to use the new public name.
## API impact
```python
from openai_codex import Codex, CodexConfig
with Codex(config=CodexConfig(codex_bin="/path/to/codex")) as codex:
...
```
Callers should now import and construct `CodexConfig`; `AppServerConfig`
is no longer part of the Python SDK surface.
## Validation
- `uv run --frozen --extra dev ruff check src/openai_codex scripts
examples tests`
- Tests are deferred to online CI for this PR.
## Why
The key/value markdown table renderer added in #24636 still operates on
`Line` values, while table cells and rendered table output now carry
`HyperlinkLine`. That mismatch breaks `codex-tui` compilation on `main`
and would risk losing semantic web-link annotations if corrected by
flattening the values.
## What changed
- Make key/value record rendering wrap and emit `HyperlinkLine` values
consistently with the existing grid renderer.
- Remap wrapped hyperlink ranges and shift them when value content is
prefixed by record-mode indentation or labels.
- Add focused coverage verifying key/value fallback output preserves
web-link destinations.
## Verification
- `just test -p codex-tui -E
'test(key_value_table_keeps_web_annotations) |
test(/table_renders_(key_value_records_when_compact_fragmentation_is_systemic_snapshot|stacked_key_value_records_when_path_column_becomes_too_narrow_snapshot|records_when_multiple_prose_columns_are_starved_snapshot)/)'`
WIll make it easier to uprev when the new draft spec is supported.
Also updates reqwest where needed for compatibility but doesn't update
it everywhere since this is already a large diff.
The new version of rmcp handles certain kinds of authentication failures
differently, this patch includes support for identifying the failing scope
in a WWW-Authenticate header.
## Overview
Allow remote `codex exec-server` registration to use existing API-key
auth while restricting where those credentials can be sent.
- Accept `CodexAuth::ApiKey` for the normal `--remote` registration
path.
- Restrict API-key remote registration to HTTPS `openai.com` and
`openai.org` hosts and subdomains, with explicit HTTP loopback support
for local development.
- Disable registry registration redirects so credentials cannot be
forwarded to an unvalidated destination.
- Retain `--use-agent-identity-auth` as the explicit Agent Identity
path.
- Document remote registration using `CODEX_API_KEY`.
## Big picture
Callers can now provide an API key directly to `exec-server`
registration without first establishing ChatGPT login state:
```sh
CODEX_API_KEY="$OPENAI_API_KEY" \
codex exec-server \
--remote "https://<host>.openai.org/api" \
--environment-id "$ENVIRONMENT_ID"
```
## Validation
- `cargo fmt --all` (`just fmt` is not installed on this host)
- `cargo test -p codex-cli -p codex-exec-server`
## Stack
- **Base: #24489 [1 of 2]** - render markdown tables in app style.
- **Current: #24636 [2 of 2]** - render cramped markdown tables as
key/value records.
Review this PR against `fcoury/app-style-markdown-tables`; it contains
only the fallback behavior for cramped tables.
## Why
The row-separated markdown table rendering in #24489 remains readable
while columns have usable room. Once long links or multiple prose-heavy
columns are compressed into narrow allocations, however, the grid can
turn words and paths into tall vertical strips that are difficult to
scan. In those cases the content matters more than preserving the grid
shape.
## What Changed
<table>
<tr><td>
<p align="center"><b>
Normal
</b></p>
<img width="1722" height="619" alt="CleanShot 2026-05-27 at 14 32 57"
src="https://github.com/user-attachments/assets/d04f5fbd-6064-4acd-91bd-072d19b983df"
/>
</td></tr>
<tr><td>
<p align="center"><b>
Narrow
</b></p>
<img width="863" height="1013" alt="CleanShot 2026-05-27 at 14 33 12"
src="https://github.com/user-attachments/assets/6a7d2968-0a68-48fd-ab5d-209b3dbaf03e"
/>
</td></tr>
<tr><td>
<p align="center"><b>
Very narrow
</b></p>
<img width="435" height="746" alt="CleanShot 2026-05-27 at 14 33 47"
src="https://github.com/user-attachments/assets/f6a59e30-b1d2-4063-9c05-43933abc77d6"
/>
</td></tr>
</table>
- Detect tables whose grid allocation causes systemic token
fragmentation or starves multiple prose-heavy columns.
- Render those tables as repeated key/value records instead of retaining
an unreadable grid.
- Use aligned label/value records when there is useful horizontal room,
and switch to a stacked narrow-record layout where each label is
followed by a full-width value when width is especially constrained.
- Preserve the themed label color, rich inline formatting, links, and
the existing grid presentation for tables that remain readable.
- Add snapshot coverage for path-heavy narrow tables, prose-heavy issue
tables, systemic compact fragmentation, and a control case that should
continue to render as a grid.
## How to Test
1. Start Codex from this branch and render a normal multi-column
markdown table at a comfortable terminal width. Confirm it still appears
as the styled row-separated grid from #24489.
2. Render a table containing a long linked record identifier or
file-like value, then narrow the terminal until the grid would split the
value into vertical fragments. Confirm it switches to key/value records,
with labels above values at very narrow widths.
3. Render a table with multiple prose-heavy columns, such as an issue
summary table with `Issue`, `Activity`, `Complexity`, and `Why start`.
Confirm a cramped width switches to records rather than wrapping several
columns into hard-to-read strips.
4. Render a compact table where only one value wraps mildly. Confirm it
stays in grid form rather than switching prematurely.
## Validation
- Ran `just test -p codex-tui` while developing the fallback and
reviewed/accepted the intended new markdown-render snapshots. The
command still reports two unrelated existing guardian feature-flag test
failures outside this diff.
- Ran `just fix -p codex-tui` and `just fmt` after the Rust changes were
complete.
- `just argument-comment-lint` cannot reach source linting locally
because Bazel fails while resolving LLVM sanitizer headers; touched
positional literal callsites were inspected manually and annotated where
needed.
## Why
Wrapped URLs in rich TUI output, especially URLs rendered inside
Markdown tables, are split across terminal rows. In terminals that
support OSC 8 hyperlinks, treating each visible fragment as part of the
complete destination enables reliable open-link and copy-link actions
even after table layout wraps the URL.
This addresses the semantic-link portion of #12200 and the behavior
described in
https://github.com/openai/codex/issues/12200#issuecomment-4535452980. It
does not change ordinary drag-selection across bordered table rows.
## What Changed
- Added shared TUI OSC 8 support that validates `http://` and `https://`
destinations, sanitizes terminal payloads, and applies metadata
separately from visible line width/layout.
- Added semantic web-link annotations to assistant and proposed-plan
Markdown, including explicit web links and bare web URLs in prose and
table cells while excluding code and non-web Markdown destinations.
- Preserved complete URL targets through table wrapping, narrow pipe
fallback, streaming, transcript overlay rendering, history insertion,
and resize replay.
- Routed intentional Codex-owned links in notices,
status/setup/app-link, feedback, onboarding, MCP/plugin help, memories,
and update surfaces through the shared hyperlink handling.
## How to Test
1. Run Codex in a terminal with OSC 8 link support, such as Ghostty, and
request an assistant response containing a Markdown table whose last
column contains a long `https://` URL.
2. Make the terminal narrow enough for the URL to wrap across multiple
bordered table rows.
3. Use the terminal's open-link or copy-link action on more than one
wrapped URL fragment and confirm each fragment resolves to the complete
original URL.
4. Resize the terminal after the table is rendered and repeat the link
action to confirm the destination survives scrollback replay.
5. Open the transcript overlay while rich output is present and confirm
web links remain interactive there.
6. As a regression check, render inline/fenced code containing URL text
and a Markdown link such as
`[https://example.com](mailto:support@example.com)`; confirm these do
not acquire a web OSC 8 destination.
Targeted automated coverage exercised Markdown links and exclusions,
wrapped and pipe-fallback tables, streaming/transcript overlay
propagation, status-link truncation, and rendered word-wrapping cell
alignment. `just test -p codex-tui` was also run; it passed the
hyperlink coverage and reproduced two unrelated existing guardian
feature-flag test failures.
## Why
Interrupted `shell_command` calls can race with the outer tool-dispatch
cancellation path. When that happens, the runtime future may be dropped
before the spawned process gets a chance to run `SIGTERM` cleanup. For
bwrapd-backed Linux sandbox commands, that can leave synthetic
protected-path mount bookkeeping such as `.git/.codex` registrations
under `/tmp` behind after a TUI interruption.
The relevant cancellation points are the outer dispatch race in
[`core/src/tools/parallel.rs`](bd184ba847/codex-rs/core/src/tools/parallel.rs (L91-L132))
and the process shutdown logic in
[`core/src/exec.rs`](bd184ba847/codex-rs/core/src/exec.rs (L1367-L1393)).
## What changed
- Keep `shell_command` dispatch alive long enough for the runtime to
finish cancellation cleanup instead of immediately returning the
synthetic aborted response.
- Fold shell-turn cancellation into the existing `ExecExpiration` path
in
[`core/src/tools/runtimes/shell.rs`](bd184ba847/codex-rs/core/src/tools/runtimes/shell.rs (L267-L274)),
so cancellation and timeout behavior stay centralized.
- On cancellation, send `SIGTERM` first, wait briefly for cleanup to
run, then hard-kill any remaining descendants in the original process
group.
- Treat `ESRCH` as an already-gone process-group cleanup case in
`codex-utils-pty`, which keeps best-effort teardown from surfacing a
stale-process race as an error.
## Verification
- `cargo test -p codex-core cancellation`
- Added regression coverage for:
- `shell_tool_cancellation_waits_for_runtime_cleanup`
- `process_exec_tool_call_cancellation_allows_sigterm_cleanup`
Client-side namespace tools are now supported by bedrock. Enable
`namespace_tools` for the Amazon Bedrock provider while continuing to
disable unsupported hosted tools such as image generation and web
search.
## Stack
- **Current: #24489 [1 of 2]** - render markdown tables in app style.
- **Stacked follow-up: #24636 [2 of 2]** - render cramped markdown
tables as key/value records.
## Why
Markdown tables currently render as boxed terminal grids, which gives
ordinary assistant output a heavier visual treatment than surrounding
rich text. This row-separated layout is the best match for how the App
renders tables, while accented headers remain distinguishable even when
a terminal font renders bold subtly.
<table>
<tr><td>
<p align="center">Codex CLI - Before</p>
<img width="1722" height="742" alt="CleanShot 2026-05-25 at 18 46 17"
src="https://github.com/user-attachments/assets/f673d92a-ebd8-46e2-b414-3d985e41b6a4"
/>
</td></tr>
<tr><td>
<p align="center">Codex CLI - After</p>
<img width="1720" height="957" alt="image"
src="https://github.com/user-attachments/assets/36a3d331-bea1-439b-b5be-e97b0731bd6f"
/>
</td></tr>
<tr><td>
<p align="center">Codex App</p>
<img width="979" height="1293" alt="CleanShot 2026-05-25 at 18 45 04"
src="https://github.com/user-attachments/assets/7d97cae0-9256-4f6e-a4b3-8b8f22b0d901"
/>
</td></tr>
</table>
## What Changed
- Render markdown tables as padded, aligned rows without an enclosing
box.
- Style table headers with the active syntax-theme accent plus bold
text, while keeping separators low contrast and theme-aware.
- Use a segmented heavy header rule and thin body-row rules, preserving
wrapping, narrow-width fallback, streaming parity, and rich-history
rendering.
- Update focused assertions and snapshots for the final table layout.
## How to Test
1. Render a markdown table in the TUI with several rows and columns.
2. Confirm the header uses the active theme accent, rows use
one-character interior padding, and the table has no enclosing box.
3. Confirm the header is followed by segmented `━` rules and multiple
body rows are separated by muted segmented `─` rules.
4. Render the same table while streaming and in history/raw-mode
toggles; the final rich layout should remain stable.
5. Render a narrow table with long content and verify wrapping or pipe
fallback does not overflow.
## Validation
- `just test -p codex-tui table`
- `just test -p codex-tui streaming::controller::tests`
- `just argument-comment-lint-from-source -p codex-tui -- --all-targets`
- `just fix -p codex-tui`
- `just fmt`
`just test -p codex-tui` was also run after accepting the snapshots; it
fails only in the unrelated existing guardian app tests
`update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
and
`update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`.
## Why
Interrupting an active turn is currently fixed to `Esc`, which is easy
to hit accidentally and cannot be customized through `/keymap`. This
gives users a less accidental binding while preserving the existing
default.
## What Changed
- Adds `tui.keymap.chat.interrupt_turn` to `/keymap`, defaulting to
`esc` and supporting remapping or unbinding.
- Uses the configured interrupt binding for running-turn status, queued
steer interruption, and `request_user_input`, including the visible
hints.
- Preserves local `Esc` behavior for popups, Vim insert mode, and
`/agent` editing while validating conflicts with fixed/backtrack and
request-input navigation bindings.
- Adds behavior and snapshot coverage for remapped interruption paths.
## How to Test
1. Run Codex and open `/keymap`, then set **Interrupt Turn** to `f12`.
2. Start a turn and confirm `Esc` no longer interrupts it while `f12`
does; the running hint should display `f12 to interrupt`.
3. Queue a steer while a turn is running and confirm the preview
displays `f12`; pressing it should interrupt and submit the steer
immediately.
4. Trigger a `request_user_input` prompt and confirm its footer uses
`f12`; with notes open, `Esc` should still clear notes while `f12`
interrupts the turn.
5. Clear the Interrupt Turn binding and confirm the key-specific
interrupt hint is removed while `Ctrl+C` remains available.
Targeted validation:
- `just write-config-schema`
- `just fix -p codex-config`
- `just fix -p codex-tui`
- `just fmt`
- `just argument-comment-lint-from-source -p codex-config -p codex-tui`
- `just test -p codex-config`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
- `just test -p codex-tui keymap_setup::tests`
- `just test -p codex-tui` (fails in two pre-existing guardian
feature-flag tests unrelated to this diff; the intentional picker
snapshot updates were reviewed and accepted)
## Why
Vim mode currently supports some normal-mode operators and motions, but
common text-object combinations like `ciw`, `daw`, `di(`, and
quote/bracket variants are still missing. That makes the composer feel
incomplete for users who expect operator + text object editing to work
inside prompts.
Closes#21383.
## What Changed
- Add Vim pending-state support for operator/text-object sequences.
- Add `c` as a normal-mode operator for text objects, so combinations
like `ciw` delete the object and enter insert mode.
- Support word, WORD, delimiter, and quote text objects:
- `iw`, `aw`, `iW`, `aW`
- `i(`, `a(`, `i)`, `a)`, `ib`, `ab`
- `i[`, `a[`, `i]`, `a]`
- `i{`, `a{`, `i}`, `a}`, `iB`, `aB`
- `i"`, `a"`, `i'`, `a'`, `i\``, `a\``
- Add configurable keymap entries and keymap picker coverage for the new
Vim text-object context.
- Regenerate the config schema and update keymap picker snapshots.
## How to Test
Manual smoke test:
1. Start Codex with Vim composer mode enabled.
2. Type a draft such as:
```text
alpha beta gamma
call(foo[bar], {"x": "hello world"})
say "one \"two\" three" now
```
3. Put the cursor on `beta`, press `ciw`, and confirm `beta` is removed
and the composer enters insert mode.
4. Escape back to normal mode, put the cursor on `gamma`, press `daw`,
and confirm `gamma` plus surrounding whitespace is removed.
5. Put the cursor inside `foo[bar]`, press `di[`, and confirm only `bar`
is removed.
6. Put the cursor inside `call(...)`, press `da(`, and confirm the whole
parenthesized section is removed.
7. Put the cursor inside the quoted text, press `ci"`, and confirm the
quote contents are removed and insert mode starts.
8. Verify cancellation does not edit text: press `d` then `Esc`, and
press `d` then `i` then `Esc`.
Targeted tests:
- `cargo test -p codex-tui --lib vim_`
- `cargo nextest run -p codex-tui keymap_setup::tests`
Additional local checks:
- `just write-config-schema`
- `just fmt`
- `just fix -p codex-tui`
- `git diff --check`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
Local full-suite note: `just test -p codex-tui` ran to completion. The
keymap snapshot failures were expected and accepted. Two unrelated
guardian feature-flag tests still fail locally:
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
`just argument-comment-lint` is currently blocked locally by Bazel
analysis before the lint runs because `compiler-rt` has an empty
`include/sanitizer/*.h` glob in the local Bazel cache. The touched Rust
diff was manually inspected for opaque positional literals.
## Why
The Python SDK currently exposes sandbox selection differently depending
on where it is used: thread lifecycle methods accept `SandboxMode`,
while turns accept the lower-level `SandboxPolicy` shape. For the common
case of choosing an access level, that leaks app-server wire details
into otherwise straightforward SDK usage.
This makes the common path explicit and discoverable: callers choose a
named sandbox preset once, using the same keyword on threads and turns.
The preset name `workspace_write` also makes the granted capability
clear at the callsite.
## What changed
- Added a root-level `Sandbox` enum with documented presets:
- `Sandbox.read_only`: read files without allowing writes.
- `Sandbox.workspace_write`: the normal default for projects with a
recorded trust decision; read files and write inside the workspace and
configured writable roots.
- `Sandbox.full_access`: run without filesystem access restrictions.
- Documented that omitting `sandbox=` delegates to app-server's
configured default, while explicit turn overrides remain sticky for
subsequent turns.
- Updated sync and async thread lifecycle and turn APIs to consistently
accept `sandbox=Sandbox...`, translating to the existing app-server
thread and turn representations internally.
- Updated the public API artifact generator so regenerated SDK wrappers
retain the friendly enum shape.
- Replaced low-level policy construction in Python docs, examples, and
the walkthrough notebook with the preset API.
- Added focused coverage for root exports, method signatures,
preset-to-wire mapping, and rejection of raw string sandbox inputs.
## API impact
High-level turn calls now use `sandbox=` instead of `sandbox_policy=`:
```python
from openai_codex import Codex, Sandbox
with Codex() as codex:
thread = codex.thread_start(sandbox=Sandbox.workspace_write)
result = thread.run("Review the diff only.", sandbox=Sandbox.read_only)
```
`thread_start(...)` already defaults to `ApprovalMode.auto_review`, so
normal writable usage is concise:
```python
with Codex() as codex:
thread = codex.thread_start(sandbox=Sandbox.workspace_write)
thread.run("Update the files in this workspace.")
```
With that combination, edits inside `cwd` and configured writable roots
run within the workspace-write sandbox. Operations that require
approval, such as edits outside those roots, are routed through auto
review. When `sandbox=` is omitted, app-server resolves its configured
default. A sandbox supplied to `run(...)` or `turn(...)` applies to that
turn and subsequent turns.
## Test coverage
- `sdk/python/tests/test_public_api_signatures.py` covers the public
export and parameter names, including the default approval mode.
- `sdk/python/tests/test_public_api_runtime_behavior.py` covers preset
mappings to the existing wire types and raw string rejection.
## Summary
- Add `request_kind` values for foreground turn, startup prewarm,
compaction, and detached memory model requests.
- Attach compaction dispatch metadata to local Responses, legacy
`/v1/responses/compact`, and remote v2 compact requests.
- Add the existing logical context-window identifier as `window_id` on
turn-owned model request metadata.
- Keep identity fields optional for detached memory requests, while
still emitting `request_kind="memory"` in non-git/no-sandbox workspaces.
## Root Cause
`x-codex-turn-metadata` has more than one producer. Foreground turns and
compaction requests own a real turn and should carry that turn identity.
Detached memory stage-one requests do not own a foreground turn, so
absent identity fields are valid rather than missing data. Startup
websocket prewarm is also a model request, but it has `generate=false`
and must not be counted as a foreground turn.
`thread_source` or session source identifies where a thread came from
(for example review, guardian, or another subagent). `request_kind`
identifies what the current outbound model request is doing (`turn`,
`prewarm`, `compaction`, or `memory`). A review or guardian thread can
issue either a normal turn request or a compaction request, so source
cannot replace request kind.
## Behavior / Impact
- Ordinary foreground requests send `request_kind="turn"`, their real
identity fields, and `window_id="<thread_id>:<window_generation>"`.
- Startup websocket warmup requests send `request_kind="prewarm"` so
they are not counted as foreground turns.
- Compaction requests send `request_kind="compaction"`, their real
owning turn identity, the existing `window_id`, and
`compaction.{trigger,reason,implementation,phase,strategy}`.
- Detached memory stage-one requests send `request_kind="memory"`
without `session_id`, `thread_id`, `turn_id`, or `window_id`; when no
workspace metadata exists, the kind-only header is still emitted.
- `session_id`, `thread_id`, `turn_id`, and `window_id` remain optional
in the header schema because detached memory requests do not own a
foreground turn or context window.
- `window_id` is not a new ID system: it is copied from the already-sent
`x-codex-window-id` / WS client metadata value at model-request dispatch
time.
- Existing `x-codex-window-id` HTTP/WS emission, value format,
generation advancement, resume behavior, and fork reset behavior are
unchanged.
- `request_kind`, `window_id`, and upstream turn-owned identity fields
remain schema-owned; input `responsesapi_client_metadata` cannot replace
their canonical values.
- No table, DAG, export, app-server API, or MCP `_meta` schema changes
are included.
A compaction attempt stopped by a pre-compact hook issues no model
request and therefore has no request header; its outcome remains in
analytics events. Status, error, duration, and token deltas also remain
analytics fields rather than request-header fields.
Future detached-memory attribution using a real initiating turn ID as
`trigger_turn_id` is intentionally not part of this PR.
## Sync With Main
- Final pushed head `716342e79` is rebased onto `origin/main@0d37db4b2`.
- The metadata conflict came from upstream `#24160`, which added
`forked_from_thread_id` on the same `turn_metadata` surface. Resolution
preserves that field and its protection from client metadata override
alongside this PR's request-kind, compaction, and window-id fields.
- While resolving the overlapping commits, I removed an accidental
recursive model-request overlay and a duplicate detached-memory header
builder before completing the rebase.
## Latency / User Experience Boundary
- Foreground turns perform no new filesystem, git, or network work. New
fields are inserted into metadata already serialized for outgoing
requests.
- Compaction issues the same model/HTTP requests with the same prompt,
model, service tier, and sampling settings; only metadata bytes change.
- Startup prewarm already sent metadata; it is now correctly classified
as `prewarm`.
- Non-git detached memory now sends a small kind-only metadata header
rather than no header.
- This client diff adds no user-visible latency mechanism beyond
negligible serialization and header bytes on already-existing requests.
## Validation
On conflict-resolved head `1d35c2cfb` based on `origin/main@487521733`:
- `just fmt` (passed)
- `just fix -p codex-core` (passed)
- `git diff --check origin/main...HEAD` (passed)
- `just test -p codex-core -E 'test(turn_metadata) |
test(websocket_first_turn_uses_startup_prewarm_and_create) |
test(responses_stream_includes_turn_metadata_header_for_git_workspace_e2e)
|
test(responses_websocket_forwards_turn_metadata_on_initial_and_incremental_create)
| test(remote_compact_v2_retries_failures_with_stream_retry_budget) |
test(window_id_advances_after_compact_persists_on_resume_and_resets_on_fork)'`
(`23 passed`; `bench-smoke` passed)
- `just test -p codex-app-server -E
'test(turn_start_forwards_client_metadata_to_responses_request_v2) |
test(turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2)
| test(auto_compaction_remote_emits_started_and_completed_items)'` (`3
passed`; `bench-smoke` passed)
- `just test -p codex-memories-write` (`29 passed`; `bench-smoke`
passed)
## Context
`docs/tui-chat-composer.md` was removed by #20896 as part of removing
local-only docs/specs from the repository. I checked the #20896 file
list and the merge commit: the composer doc was deleted, not moved or
copied, and current `main` does not contain a replacement composer
narrative doc.
Current guidance should keep contributors and agents focused on the docs
that still exist: the module docs in `chat_composer.rs` and
`paste_burst.rs`.
## Summary
- Removes the scoped TUI bottom-pane AGENTS.md requirement to update
`docs/tui-chat-composer.md`.
- Removes stale module-doc references to that deleted narrative doc from
`chat_composer.rs` and `paste_burst.rs`.
## Validation
- Checked #20896 and the merge commit with rename/copy detection to
confirm `docs/tui-chat-composer.md` was deleted rather than moved.
- Searched current `main` for a replacement composer narrative doc.
- Not run; documentation-only change.
This adds slash command completion behavior for argument-taking
commands, where text after the partially typed command becomes inline
arguments instead of being discarded. This addresses the workflow of
drafting text first, moving to the start, and completing a slash command
around that existing draft. Before this change, this workflow would
remove all user-input text aside from the slash command, which can be
frustrating if the user had just typed out a long and well thought out
goal.
- Preserves the draft tail for inline-argument slash commands like
`/goal` and `/review` when completing with `Tab` or `Enter`.
- Keeps popup filtering focused on the command fragment under the cursor
rather than the full draft text.
- Leaves slash commands that do not support inline arguments unchanged,
so completion still replaces the existing draft tail for those commands.
- Adds focused TUI tests under slash input covering preserved arguments,
cursor edge cases, and the negative case for a command without inline
args.
Follow-up simplification and test relocation from #24683 folded into
this PR.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
move `DEV_WEBSITE_VERCEL_DEPLOY_HOOK_URL` to a repo environment secret.
to keep scope of use of that env secret small, move the vercel website
redeploy to its own post-release job.
# Summary
The standalone update action currently downloads and runs the Codex
installer as an interactive command. When an existing managed Codex
install is present, accepting an update can therefore enter an installer
prompt instead of completing the update.
This change runs the standalone installer with `CODEX_NON_INTERACTIVE=1`
on macOS/Linux and Windows. The installer environment-variable support
is introduced by the parent PR; this PR wires that behavior into the
Codex CLI update action. The rendered Windows command remains
shell-safe, and long update commands wrap within the update-notice card.
The standard test target snapshots the standalone notice for both
platforms.
# Stack
1. [#21567](https://github.com/openai/codex/pull/21567) - Adds
environment-controlled release selection and noninteractive installer
behavior.
2. [#24637](https://github.com/openai/codex/pull/24637) - Runs
standalone updates with `CODEX_NON_INTERACTIVE=1`. (current)
3. [#24639](https://github.com/openai/codex/pull/24639) - Removes
explicit release argument inputs in favor of `CODEX_RELEASE`.
# Evidence
Standalone updater-shaped macOS install with an existing npm-managed
Codex on `PATH`:
https://github.com/user-attachments/assets/a27fe9e9-db3a-4c39-a514-24bd3d1f01e8
# Testing
Tests: targeted `codex-tui` update-action and update-notice snapshot
tests, Rust formatting, benchmark smoke validation, macOS live-terminal
standalone-update smoke testing, Windows ARM64 PowerShell
standalone-update smoke testing through Parallels, and CI.
## Why
Codex stores thread, log, goal, and memory state in bundled SQLite
databases through SQLx. We have a suspected SQLite WAL-reset corruption
issue under heavy concurrent writer load, especially when multiple
subagents are active. The existing `sqlx 0.8.6` dependency kept us on an
older `libsqlite3-sys` / bundled SQLite, so this PR moves the SQLx stack
far enough forward to pick up the newer bundled SQLite library.
## What changed
- Bump the workspace `sqlx` dependency to `0.9.0`.
- Use the SQLx 0.9 feature names explicitly: `runtime-tokio`,
`tls-rustls`, and `sqlite-bundled`.
- Update `Cargo.lock` so `sqlx-sqlite` resolves through `libsqlite3-sys
0.37.0`.
- Refresh `MODULE.bazel.lock` for the dependency changes.
- Adapt `codex-state` to SQLx 0.9:
- build dynamic state queries with `QueryBuilder<Sqlite>` instead of
passing dynamic `String`s to `sqlx::query`;
- remove the old `QueryBuilder` lifetime parameter from helper
signatures;
- preserve SQLx's new `Migrator` fields when constructing runtime
migrators.
## Verification
- `just test -p codex-state`
- `just bazel-lock-check`
- `cargo check -p codex-state --tests`
## Why
The TUI Vim composer currently diverges from normal Vim editing in two
common workflows: pressing `e` repeatedly can remain stuck at an
existing word end, and normal mode does not support `C` for changing
through the end of the line. The existing `D` behavior also removes the
newline when the cursor is already at the line boundary, which makes the
new `C` action and existing deletion action surprising in multiline
prompts.
Closes#23926.
Closes#24238.
## What Changed
- Make normal-mode `e` advance from the current word end to the next
word end, including for operator motions such as `de`.
- Add configurable Vim normal-mode `change_to_line_end` behavior, bound
to `C` by default, which deletes to the end of the current line and
enters Insert mode.
- Keep the newline intact when `D` or `C` is pressed at the end-of-line
boundary.
- Add regression coverage for repeated `e`, `de`, `C`, and the multiline
`C`/`D` boundary behavior.
- Regenerate the config schema and update the keymap picker snapshots
for the new Vim action.
## How to Test
1. Run Codex with Vim composer mode enabled:
```bash
cd codex-rs
cargo run --bin codex -- -c tui.vim_mode_default=true
```
2. Enter `alpha beta gamma`, press `Esc`, `0`, then press `e`
repeatedly.
Confirm the cursor advances through the ends of `alpha`, `beta`, and
`gamma`.
3. Enter `hello world`, press `Esc`, `0`, `w`, then `C`.
Confirm `world` is deleted and the composer enters Insert mode.
4. Enter a multiline prompt with `hello` above `world`, press `Esc`,
`k`, `$`, and then `D`.
Confirm the newline is preserved and the two lines do not join.
5. At the same boundary, press `C` and type `!`.
Confirm the composer enters Insert mode and yields `hello!` above
`world`, preserving the newline.
Targeted automated verification:
- `just fix -p codex-tui`
- `just argument-comment-lint-from-source -p codex-tui -p codex-config`
- `cargo insta pending-snapshots` reports no pending snapshots.
- `just test -p codex-tui` validates the new Vim and keymap snapshot
coverage, but the command remains red due to two reproducible unrelated
failures in `app::tests::update_feature_flags_disabling_guardian_*`.
## Validation Note
The workspace-wide `just argument-comment-lint` form is currently
blocked during Bazel analysis by the existing LLVM `compiler-rt` missing
`include/sanitizer/*.h` failure; package-scoped source linting for the
changed Rust crates passed.
## Why
Plugin and marketplace mutations are applied by the app server, but
several TUI follow-up paths still refreshed state from the TUI host
config. In remote workspace mode, that can leave plugin UI state tied to
stale client-local `config.toml` after the server has already applied
the mutation.
## What
- Stop reloading the TUI host config after app-server-owned plugin,
marketplace, skill, and app mutations.
- Use the same app-server-owned refresh path for local and remote
sessions: ask the app server to reload user config where the running
session needs it, then refetch plugin list/detail state from the app
server.
- Build plugin mention candidates from existing app-server `plugin/list`
and `plugin/read` data in both local and remote sessions instead of
TUI-host plugin config.
- Avoid the duplicate local config reload after `ReloadUserConfig` asks
the app server to reload config.
## Verification
Manually launched a local WebSocket app-server with a temp server
`CODEX_HOME`, launched the TUI with a separate temp host `CODEX_HOME`
and `--remote`, installed a sample plugin from a temp local marketplace
through `/plugins`, and confirmed the TUI refreshed to installed state
while only the server config gained `[plugins."sample@debug"]`. Trace
logs showed the TUI using app-server `plugin/list` and `plugin/read` for
the refresh path.
## Summary
- Change last-`n` fork truncation to start at the first fork-turn
boundary instead of returning the full rollout when the fork history is
shorter than the requested window.
- Add coverage for the startup-prefix case in both rollout truncation
tests and agent control spawn behavior.
- Ensure bounded forked children still rebuild context after the cached
prefix is truncated.
## Testing
- Added unit coverage for truncation behavior when the parent history is
under the requested fork-turn limit.
- Added an agent control test covering bounded fork spawn behavior with
startup context present.
- Not run (not requested).
## Why
Extensions can currently observe thread start, resume, and stop, but
they do not have a lifecycle point for the host to say that immediately
pending thread work has drained. That makes idle follow-up behavior
harder to express as extension-owned logic instead of host-specific
plumbing.
This adds an explicit idle lifecycle hook so an extension can react when
a thread becomes idle while the host keeps ownership of whether any
submitted follow-up input starts a turn, is queued, or is ignored.
## What changed
- Added `ThreadIdleInput` with access to the session-scoped and
thread-scoped extension stores.
- Added a default `on_thread_idle` method to
`ThreadLifecycleContributor`.
- Re-exported `ThreadIdleInput` from the extension API surface.
## Testing
Not run; this only extends the extension API trait surface with a
default hook and exported input type.
## Summary
- Add the missing additional_context field to the guardian review
Op::UserInput test initializer.
## Test plan
- just fmt
- just test -p codex-core guardian_review
- just test -p codex-core (compiles, then fails on local environment
issues: sandbox-exec Operation not permitted, missing test_stdio_server
helper binary, and unrelated timeouts)
## Why
The extracted goal runtime needs a host-callable path for turns that
stop because the workspace usage limit is reached. In that case, any
in-turn goal progress should be accounted before the goal becomes
terminal, and active goal accounting must be cleared so later
tool-finish or turn-stop handling does not keep charging usage to a
stopped goal.
## What changed
- Adds `GoalRuntimeHandle::usage_limit_active_goal_for_turn`, which
accounts current active-goal progress, marks the active or
budget-limited thread goal as `UsageLimited`, records terminal metrics
when the status changes, clears active goal accounting, and emits the
updated goal event.
- Covers both active and budget-limited goals in
`ext/goal/tests/goal_extension_backend.rs`, including the invariant that
later token/tool events do not add usage after the goal has been
usage-limited.
## Testing
- Added
`usage_limit_active_goal_accounts_progress_and_clears_accounting`.
- Added `usage_limit_budget_limited_goal_accounts_remaining_progress`.
This reverts commit 5381240f57. Gov cloud
should not be supported
# External (non-OpenAI) Pull Request Requirements
External code contributions are by invitation only. Please read the
dedicated "Contributing" markdown file for details:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
Clear inherited legacy `notify` from Guardian review session config,
since we should not be passing auto review threads into `notify`
targets. Keeps legacy notify payload and hook runtime behavior unchanged
for normal user turns.
## Testing
- [x] add a Guardian config regression and dedicated Guardian
integration test so review sessions cannot inherit parent notify hooks
# Summary
The Codex standalone installers can pause after installation to ask
about an older managed install or launching Codex. That makes unattended
bootstrap and update flows hard to complete reliably.
This PR adds noninteractive installer control on macOS/Linux and Windows
through `CODEX_NON_INTERACTIVE=1`. Noninteractive operation is
environment-only, which gives automated callers one stable way to
suppress prompts. When a noninteractive install leaves an older npm,
bun, or brew-managed Codex installed, the standalone bin is configured
ahead of that command on `PATH` so the newly installed Codex is the one
future launches select. It also supports `CODEX_RELEASE` for callers
that select a release through environment variables while retaining the
existing explicit release inputs. Release selection accepts `latest`,
stable `x.y.z` versions, and Codex prereleases written as
`rust-v0.134.0-alpha.3`, `v0.134.0-alpha.3`, or `0.134.0-alpha.3`; it
validates that shape before constructing release requests.
# Stack
1. [#21567](https://github.com/openai/codex/pull/21567) - Adds release
and noninteractive environment controls to the installers. (current)
2. [#24637](https://github.com/openai/codex/pull/24637) - Runs
standalone updater installs with `CODEX_NON_INTERACTIVE=1`.
3. [#24639](https://github.com/openai/codex/pull/24639) - Removes
explicit release argument inputs in favor of `CODEX_RELEASE`.
# Evidence
| Before | After |
| --- | --- |
| 
| 
|
Environment-controlled macOS install with an existing npm-managed Codex
on `PATH`:
https://github.com/user-attachments/assets/442e0b5b-4a32-4bf5-996b-68784777380d
# Design decisions
Windows installs using the older standalone bin layout still require an
interactive migration confirmation. Noninteractive mode does not
auto-migrate that existing directory because replacing it is a
destructive transition for an early, limited-use layout; unattended
installs on that layout fail with an instruction to rerun interactively.
# Testing
Tests: installer syntax validation, release-selector acceptance and
rejection coverage including PowerShell `Latest` compatibility, macOS
live-terminal installer smoke testing with environment-controlled stable
and prerelease installation and competing PATH precedence, shell
rejection of the omitted noninteractive flag, and Windows ARM64
PowerShell smoke testing with environment-only noninteractive behavior,
retained release input, and competing PATH precedence through Parallels.
## Summary
- Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
environment config.
- Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
match the new version.
- Leave purpose-specific toolchains unchanged, including the
`argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
build pin.
- Includes fixes for new lints from `just fix` and a few codex-authored
fixes for lints without a suggestion.
## Why
When a turn needs a follow-up request after tool output is recorded,
Codex can still appear stuck in `Thinking` before the next `/responses`
request is opened. The existing local trace showed the last completed
response and the absence of a new backend request, but it did not show
whether the stall was in tool-router preparation or later request setup.
Issue: N/A (internal incident investigation)
## What Changed
Added trace spans around the pre-stream tool-router handoff in
`core/src/session/turn.rs`, including the `built_tools` phase and the
MCP manager read lock.
Added per-server MCP tool-listing spans and trace breadcrumbs in
`codex-mcp/src/connection_manager.rs` with startup snapshot /
startup-complete state so a pending MCP client is visible in feedback
logs instead of looking like a silent hang.
## Verification
- `just fmt`
- `just test -p codex-mcp`
- `just test -p codex-core` (prior full rerun fails in this workspace on
unrelated integration tests: code-mode output length expectations, one
shell timeout formatting assertion, and shell snapshot timeouts; latest
review-fix rerun compiled and passed 1160 tests before I stopped the
abnormally slow unrelated suite)
add new `parse_tool_input_schema_without_compaction` to bypass the
existing compaction/trimming of client-provided tool schemas that are
over 4k bytes.
we want this for standalone web search to keep field guidance/metadata
on certain fields; this keeps us closer to parity with existing hosted
tool schema (which didnt go through this 4k byte filter).
## Why
`continuation_turn_id` was introduced to distinguish synthetic goal
continuation turns for the no-tool continuation suppression heuristic.
#20523 removed that heuristic, but left the marker behind. It is still
written and cleared without affecting any runtime decision.
## What Changed
- Remove `GoalRuntimeState::continuation_turn_id`.
- Remove the marker setter/clearer and their now-no-op start, finish,
and abort call sites.
## Testing
- Not run yet (deferred at request).
## Why
- Runtime analytics events report `thread_id`, which identifies the
individual thread emitting an event
- They don't report `session_id`, which identifies the shared session
for a root thread and its subagent threads
- Emitting both identifiers allows analytics to group related activity
## What Changed
- Adds `session_id` to relevant analytics events (thread_initalized,
turn, turn_steer, compaction, guardian_review)
- Tracks each thread's session ID in the analytics reducer so subsequent
thread scoped events emit the same value
- Carries the shared session ID through subagent initialization
## Verification
- `just test -p codex-analytics` validates event payloads and subagent
session grouping.
- Focused `codex-app-server` tests validate session IDs for thread,
turn, and steer events.
- Focused `codex-core` tests validate root and subagent session ID
propagation.
## Why
Older persisted rollouts can contain `input_image.detail` values of
`auto` or `low` from before `ImageDetail` was narrowed to
`high`/`original`. Current deserialization rejects those values, which
can make resume skip later compacted checkpoints and reconstruct an
oversized raw suffix before the next compaction attempt.
Confirmed Sentry reports fixed by this compatibility path:
- [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
- [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
- [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
- [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
## Background
[openai/codex#20693](https://github.com/openai/codex/pull/20693) added
image-detail plumbing for app-server `UserInput` so input images could
explicitly request `detail: original`. The Slack discussion behind that
PR was about ScreenSpot / bridge evals where user input images were
resized, while tool output images already had MCP/code-mode ways to
request image detail.
In review, the intended new API surface was narrowed to `high` and
`original`: default to `high`, allow `original` when callers need
unchanged image handling, and avoid encouraging new `auto` or `low`
usage. That policy still makes sense for newly emitted values.
The missing compatibility piece is persisted history. Older rollouts can
already contain `auto` and `low`, and resume reconstructs typed history
by deserializing those rollout records. Rejecting old values at that
boundary causes valid compacted checkpoints to be skipped. This PR
restores `auto` and `low` as real variants so old records deserialize
and round-trip without being rewritten as `high`, while product paths
can continue to default to `high` and avoid emitting `auto` for new
behavior.
## What changed
- Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
protocol values.
- Preserved `auto`/`low` through rollout deserialization, MCP image
metadata, code-mode image output, and schema/type generation.
- Kept local image byte handling conservative: only `original` switches
to original-resolution loading; `auto`/`low`/`high` continue through the
resize-to-fit path while retaining their detail value.
- Added regression coverage for enum round-tripping and code-mode `low`
detail handling.
## Testing
- `just write-app-server-schema`
- `just test -p codex-protocol`
- `just test -p codex-tools`
- `just test -p codex-code-mode`
- `just test -p codex-app-server-protocol`
- `just test -p codex-core
suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
- `just test -p codex-core
suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
- Loaded broken rollouts on local fixed builds, and started/completed
new turns.
I also attempted `just test -p codex-core`; the local broad run did not
finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
out. The failures were broad timeout/deadline failures across unrelated
areas; targeted changed-path core tests above passed.
## Why
Windows sandbox diagnostics are currently hard to recover from
`/feedback` even though they are often the most useful artifact when
debugging sandbox behavior. Now that sandbox logging uses daily rolling
files, feedback can safely include the current day's sandbox log without
uploading the old ever-growing legacy `sandbox.log`.
## What changed
- Add a `codex-windows-sandbox` helper that resolves the current daily
sandbox log from `codex_home`.
- When feedback is submitted with logs enabled on Windows, app-server
attaches today's sandbox log if it exists.
- Upload the attachment under the stable filename `windows-sandbox.log`,
independent of the dated on-disk filename.
- Keep existing raw `extra_log_files` behavior unchanged for rollout and
desktop log attachments.
## Verification
- `cargo fmt -p codex-app-server -p codex-windows-sandbox`
- `cargo test -p codex-windows-sandbox
current_log_file_path_for_codex_home_uses_sandbox_dir`
- `cargo test -p codex-app-server
windows_sandbox_log_attachment_uses_current_log`
- Manual CLI/TUI `/feedback` test confirmed Sentry received
`windows-sandbox.log`.
## Why
Remote image submissions currently wrap native `input_image` spans in
literal `<image>` and `</image>` text spans. Those extra prompt tokens
add structure without providing label or routing information.
## What Changed
- Serialize `UserInput::Image` directly as an `input_image` content
span.
- Preserve named local-image framing and legacy wrapper parsing for
labeled attachments and existing histories.
- Update existing request-shape expectations for drag-and-drop images,
model switching, and compaction.
## Validation
- `just test -p codex-protocol`
- Focused `codex-core` run covering
`drag_drop_image_persists_rollout_request_shape`,
`model_change_from_image_to_text_strips_prior_image_content`, and
`snapshot_request_shape_pre_turn_compaction_including_incoming_user_message`
## Notes
- A broader `just test -p codex-core` run was attempted; the affected
tests passed, while the overall run failed in unrelated CLI, MCP, and
tooling tests plus a `thread_manager` timeout.
## Why
The Windows sandbox runner still carried the old `SandboxPolicy`
compatibility path even though core now computes `PermissionProfile`.
That meant Windows command-runner execution could only see the legacy
projection, so profile-only filesystem rules such as deny globs were not
part of the runner input.
## What Changed
- Removed the Windows-local `SandboxPolicy` parser/export and deleted
`windows-sandbox-rs/src/policy.rs`.
- Changed restricted-token capture/session setup, elevated setup,
world-writable audit, read-root grant, and command-runner session APIs
to accept `PermissionProfile` plus the profile cwd.
- Bumped the elevated command-runner IPC protocol to version 2 because
`SpawnRequest` now carries `permission_profile` /
`permission_profile_cwd` instead of the legacy `policy_json_or_preset` /
`sandbox_policy_cwd` fields.
- Updated core exec, unified exec, debug-sandbox, TUI setup/grant flows,
and app-server setup to pass the actual effective `PermissionProfile`.
- Left regression coverage asserting the old IPC policy fields are
absent and the runner serializes tagged `PermissionProfile` JSON.
## Verification
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core windows_sandbox`
- `cargo test -p codex-app-server
request_processors::windows_sandbox_processor`
- `just fix -p codex-windows-sandbox -p codex-core -p codex-app-server
-p codex-cli -p codex-tui`
- `just fix -p codex-cli -p codex-tui`
- `just fix -p codex-windows-sandbox -p codex-tui`
- `rg "\\bSandboxPolicy\\b" codex-rs/windows-sandbox-rs` returned no
matches.
Note: `cargo test -p codex-cli` was attempted but did not reach crate
tests because local disk filled while compiling dependencies (`No space
left on device`). The targeted clippy pass compiled the affected CLI/TUI
surfaces afterward.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23813).
* #24108
* __->__ #23813
Fixes#24249.
## Why
Codex already supports discovering marketplaces under both
`.agents/plugins/marketplace.json` and
`.claude-plugin/marketplace.json`. The Git marketplace auto-upgrade
no-op check only looked for the `.agents` layout. That meant an
installed `.claude-plugin` marketplace with matching revision metadata
still looked absent, so plugin list/startup upgrade work could stage and
re-activate the same marketplace again.
That matches the failure shape in #24249: the report called out repeated
marketplace sync/cache refresh logs and a large recently-touched
`.tmp/marketplaces/.staging` directory. This change makes the
auto-upgrade path recognize the installed `.claude-plugin` marketplace
as already current, which should remove that staging/activation feedback
loop.
## What changed
`codex-rs/core-plugins/src/marketplace_upgrade.rs` now uses the existing
supported marketplace manifest discovery helper when deciding whether an
installed Git marketplace is already current. Existing local plugin
source validation is unchanged; `source: "./"` still remains invalid.
## Confidence
Confidence is high that this fixes the repeated marketplace upgrade
path: the old hardcoded layout check was definitely wrong for installed
`.claude-plugin` marketplaces, and the reported staging churn points
directly at that path.
Confidence is not 100% because we do not have a CPU profile or a fully
re-run reporter repro. A malformed marketplace entry can still be logged
as invalid if another caller repeatedly lists plugins; this PR fixes the
staging/upgrade feedback loop that likely made the failure pathological,
not every possible source of repeated marketplace resolution.
## Summary
TUI plugin mention refresh still joined app-server plugin inventory with
client-local plugin config, which can diverge once plugin state is owned
by the app server.
This changes the TUI to mirror the GUI client: `plugin/list` is the
autocomplete source, and mention candidates are plugin-level entries
filtered to installed, enabled, and not disabled by admin. The TUI no
longer reads local plugin config or calls `plugin/read` while refreshing
plugin mention candidates.
## API shape and limitations
The current app-server API does not expose effective per-session plugin
capability summaries for mention autocomplete. As in the GUI,
autocomplete now trusts `plugin/list` metadata rather than proving which
plugin capabilities are loaded in the active session.
That avoids stale client-local reads and the cwd/remote detail gaps in
`plugin/read`, but intentionally accepts the same list-level tradeoff as
the app: if `plugin/list` reports a remote plugin before its local
bundle is materialized, the plugin can still appear as a mention
candidate.
## Why
When Codex calls responsesapi, we currently send `session_id`,
`thread_id`, and `turn_id` among other things as
`client_metadata["x-codex-turn-metadata"]`. This PR adds
`forked_from_thread_id` which helps explain the "lineage" of a forked
thread.
## What's changed
- Track the immediate history source copied into a forked thread through
thread/session creation, including subagent and review turn metadata
paths.
- Include `forked_from_thread_id` in Codex turn metadata while
preventing turn-scoped Responses API client metadata from overwriting
Codex-owned lineage fields.
- Add coverage for fork lineage in turn metadata and the app-server
Responses API request path.
Fixes#24186.
## Why
When the TUI resumes a thread through the local app-server daemon with a
selected workspace, `thread/resume` can hit an already-loaded but idle
cached thread. That path previously rejoined the cached `CodexThread`,
so cwd/config overrides in `ThreadResumeParams` were ignored and the
resumed session kept using the old cwd.
## What changed
App-server now treats a loaded-but-idle thread with no subscribers as a
cache entry when resume overrides differ: it unloads that cached thread
and lets the normal resume path rebuild it with the requested
cwd/config. Threads that still have subscribers, or active runtime work,
continue to rejoin the existing loaded thread so in-flight state remains
observable.
The existing thread teardown helper was generalized from
archive-specific cleanup to shared unload cleanup for this path.
## Why
When the app-server remote-control websocket path stalls during
connection setup or teardown, the existing logs do not show where the
task stopped, and several awaits can keep the task from returning
promptly. That makes offline or stale-host incidents hard to distinguish
from expected shutdown or disable flow.
Issue: N/A (internal incident investigation)
## What Changed
Added structured lifecycle and status logging around remote-control
enable/disable requests, websocket task startup and exit, connection
cycles, enrollment context, and status/environment transitions.
Bound websocket connect, transport-event forwarding, and
connection-worker shutdown waits. On timeout, the code logs the stalled
operation and stops or aborts workers so the loop can reconnect or exit
instead of waiting indefinitely. Ping sends now also observe shutdown
cancellation.
## Summary
Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.
## API Shape
The parameter shape is:
```ts
additionalContext?: Record<string, {
value: string
kind: "untrusted" | "application"
}> | null
```
Example:
```json
{
"additionalContext": {
"browser_info": {
"value": "Active tab is CI failures.",
"kind": "untrusted"
},
"automation_info": {
"value": "CI rerun is in progress.",
"kind": "application"
}
}
}
```
The keys are opaque and caller-defined.
## Context Injection
When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.
`kind: "untrusted"` entries are inserted with role `user`:
```text
<external_${key}>${value}</external_${key}>
```
`kind: "application"` entries are inserted with role `developer`:
```text
<${key}>${value}</${key}>
```
Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.
For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.
## Dedupe Strategy
`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.
Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.
Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.
## What Changed
- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.
## Summary
Fix the TUI `$` app mention paths so App Directory rows that are not
accessible are not treated as usable apps.
This includes the core preservation fix from #24104, but expands it to
the other app mention paths:
- preserve app-server `is_accessible` flags when partial
`app/list/updated` snapshots reach the TUI
- require apps to be both accessible and enabled when resolving exact
`$slug` mentions
- require restored/stale `app://...` bindings to point at accessible,
enabled apps before emitting structured app mentions
- remove the now-unused `codex-chatgpt` dependency from `codex-tui`,
which addresses the `cargo shear` failure seen on #24104
## Root Cause
The app server already sends merged app snapshots with accessibility
computed. The TUI handled app-server app list updates as partial app
loads and re-ran the old accessible-app merge path. That path treated
every notification row as accessible, so App Directory entries with
`isAccessible=false` could appear in `$` suggestions.
Regression source: #22914 routed app-list updates through the app server
while reusing the old TUI partial-load handling. Related precursor:
#14717 introduced the partial-load path, but #22914 made it user-visible
for app-server updates.
## Issues
Fixes#24145Fixes#24205Fixes#24319
## Validation
- `just fmt`
- `git diff --check`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just argument-comment-lint -p codex-tui`
- `just test -p codex-tui
chatwidget::tests::popups_and_settings::apps_notification_update_excludes_inaccessible_apps_from_mentions
chatwidget::tests::composer_submission::submit_user_message_ignores_inaccessible_app_mentions_from_bindings
chatwidget::skills::tests::find_app_mentions_requires_accessible_enabled_apps_for_bound_paths
chatwidget::skills::tests::find_app_mentions_requires_accessible_enabled_apps_for_slugs`
## Why
Raw output mode intentionally sends logical source lines to the terminal
without Codex-inserted wrapping so copied content retains its original
line structure. In Zellij, soft-wrapped continuation rows from those raw
lines are not confined by the inline history scroll region. When raw
mode replays a long transcript, continuation rows can occupy the
composer viewport and are overwritten on the following draw, leaving the
transcript visibly truncated underneath the composer.
This is specific to the combination of Zellij and raw terminal-wrapped
history. Rich output and non-Zellij terminals should continue using the
existing insertion behavior.
Related context: #20819 introduced raw output mode, and #22214 removed
the broad Zellij insertion workaround after the standard rich-output
path no longer required it.
| Before | After |
|---|---|
| <img width="1728" height="916" alt="image"
src="https://github.com/user-attachments/assets/f85398a5-e930-46d9-bcfd-106a24c41466"
/> | <img width="1723" height="912" alt="image"
src="https://github.com/user-attachments/assets/5c62e16a-a6e5-4842-bcb2-eab163cda04c"
/> |
## What Changed
- Cache Zellij detection in `Tui` and select a dedicated insertion mode
only for `HistoryLineWrapPolicy::Terminal` batches in Zellij.
- For that guarded path, clear the existing viewport, append raw source
lines through the terminal so its soft wrapping remains
selection-friendly, and reserve empty viewport rows before redrawing the
composer.
- Add snapshot regressions for both an incremental soft-wrapped raw
insert and an overflowing raw transcript replay that starts at the top
of the cleared terminal.
## How to Test
1. Start Codex inside Zellij with raw output enabled or toggle raw
output after a multiline response is in history.
2. Produce or replay output containing long logical lines, such as a
fenced shell command with several wrapped lines.
3. Confirm the wrapped history remains visible above the composer and
the composer no longer overwrites the end of the response.
4. Toggle back to rich output or run outside Zellij and confirm standard
history rendering still behaves normally.
Targeted tests run:
- `just test -p codex-tui vt100_zellij_raw -- --nocapture`
Additional validation notes:
- `just test -p codex-tui` was attempted; the two new Zellij raw
insertion tests passed, while two existing
`app::tests::update_feature_flags_disabling_guardian_*` tests failed
outside this history insertion path.
- `just argument-comment-lint` was attempted but local Bazel analysis
fails before reaching the changed source because the LLVM `compiler-rt`
package is missing `include/sanitizer/*.h`. Modified literal callsites
were inspected manually.
## Summary
Add the extension-backed standalone `web.run` tool so Codex can call the
standalone search endpoint through the `codex-api` search client and
return its encrypted output to Responses.
- gate the new tool behind `standalone_web_search`
- install the extension in the app-server thread registry and hide
hosted `web_search` when standalone search is enabled for OpenAI
providers so the two paths stay mutually exclusive
- build search context from persisted history using a small tail
heuristic: previous user message, assistant text between the last two
user turns capped at about 1k tokens, and current user message
## Test Plan
- `cargo test -p codex-web-search-extension`
- `cargo test -p codex-api`
- `cargo test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
## Summary
Generated memory rows and their stage-one/stage-two job state currently
live in `state_5.sqlite` alongside thread metadata. That makes memory
cleanup and regeneration share the main state schema even though those
rows are memory-pipeline data and can be rebuilt independently from the
durable thread records.
This PR moves the memory-owned tables into a dedicated
`memories_1.sqlite` runtime database while keeping thread metadata in
`state_5.sqlite`.
## Changes
- Adds a separate memories DB runtime, migrator, path helpers, telemetry
kind, and Bazel compile data for `state/memory_migrations`.
- Introduces `MemoryStore` behind `StateRuntime::memories()` and moves
memory table/job operations onto that store.
- Drops the old memory tables from the state DB and recreates their
schema in `state/memory_migrations/0001_memories.sql`.
- Updates memory startup, citation usage tracking, rollout pollution
handling, `debug clear-memories`, and app-server `memory/reset` to
operate through the memories DB.
- Preserves cross-DB behavior by hydrating thread metadata from the
state DB when selecting visible memory outputs and checking stage-one
staleness.
## Verification
- Added/updated `codex-state` tests for deleted-thread memory visibility
and already-polluted phase-two enqueue behavior.
- Updated `debug clear-memories`, app-server `memory/reset`, and
memories startup tests to seed and assert memory rows through
`memories_1.sqlite`.
## Why
Goal idle accounting is supposed to survive a thread resume. Previously,
the resume hook restored the active goal state inline from the extension
lifecycle contributor, which left the runtime handle without a reusable
restoration path and made the behavior hard to cover directly. When a
thread with an active goal was resumed, goal accounting could lose track
of the active idle goal instead of continuing to accrue elapsed time.
## What changed
- Moved thread-resume restoration into
`GoalRuntimeHandle::restore_after_resume()` so the runtime owns
rehydrating active goal accounting from persisted thread goal state.
- Kept disabled goal runtimes as a no-op and preserved the existing
warning path when persisted goal state cannot be loaded.
- Added a backend regression test that seeds an active goal, resumes the
thread, waits briefly, and verifies elapsed idle time is reflected on
the next external goal mutation.
## Testing
- Not run locally; this metadata update only rewrote the PR title/body.
## Why
Codex 0.131 started enabling tmux `modifyOtherKeys` mode 2 when the
active tmux session reported `extended-keys-format csi-u`, and also when
that format could not be queried. The fallback was meant to help
compatible tmux panes enter extended-key mode, but it breaks iTerm2
control-mode sessions on older tmux.
Issue #23711 reproduces with:
```bash
ssh -t ubuntu@192.168.68.149 'tmux -CC new -A -s main'
```
On tmux 3.2a, `extended-keys-format` is not available. With mode 2
enabled, `Ctrl-C` is delivered as `^[[27;5;99~` instead of the normal
interrupt/control key path, so Codex does not handle it. Running with
`CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT=1` restores `Ctrl-C`, which
points at keyboard mode setup rather than chat input routing.
## What Changed
- Only request `modifyOtherKeys` mode 2 when tmux explicitly reports
`extended-keys-format csi-u`.
- Treat an unknown or unavailable tmux extended-key format as
unsupported for this mode.
- Update the keyboard mode unit coverage so `None` no longer opts into
`modifyOtherKeys`.
This preserves the explicit modern tmux `csi-u` path from #21943 while
avoiding the unsafe fallback on older or unqueryable tmux setups.
## How to Test
Regression path from #23711:
1. Start iTerm2 tmux integration against an older tmux host:
```bash
ssh -t ubuntu@192.168.68.149 'tmux -CC new -A -s main'
```
2. Start patched Codex.
3. Run `/keymap debug`, press a regular key, then press `Ctrl-C`.
4. Confirm `Ctrl-C` closes the inspector and Codex remains responsive
without `CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT=1`.
5. Confirm `Shift+Enter` still inserts a newline in the same session.
Modern tmux compatibility path:
1. Start an ordinary tmux 3.6a server with explicit `csi-u`:
```bash
tmux -L codex-csiu -f /dev/null new-session -d -s repro
tmux -L codex-csiu set-option -g extended-keys on
tmux -L codex-csiu set-option -g extended-keys-format csi-u
tmux -L codex-csiu attach -t repro
```
2. Start patched Codex.
3. From another terminal, confirm the Codex pane reports `mode=Ext 2`:
```bash
tmux -L codex-csiu list-panes -a -F '#{pane_id} mode=#{pane_key_mode}
cmd=#{pane_current_command}'
```
4. Type `one`, press `Shift+Enter`, type `two`, and confirm the composer
shows two lines without submitting.
5. Press `Ctrl-C` and confirm Codex handles it normally.
Targeted tests:
- `./tools/argument-comment-lint/run.py -p codex-tui -- --lib`
- `just test -p codex-tui` runs the new keyboard mode test successfully;
the full run currently reports two unrelated guardian feature-flag test
failures:
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
No documentation update is needed.
## Why
`core/src/goals.rs` already emits OTEL metrics for goal creation,
resume, terminal transitions, token counts, and duration. As `/goal`
moves into `ext/goal`, the extension needs to preserve that telemetry
contract instead of only emitting app-visible `ThreadGoalUpdated`
events.
This keeps the existing `codex.goal.*` metric surface intact while goal
lifecycle ownership shifts toward the extension.
## What changed
- Added an extension-local `GoalMetrics` helper that records the
existing `codex.goal.*` counters and histograms through `codex-otel`.
- Threaded an optional `MetricsClient` through `install_with_backend`,
`GoalExtension`, `GoalRuntimeHandle`, and `GoalToolExecutor`.
- Emitted created, resumed, and terminal goal metrics from the extension
paths that create goals, restore active goals on thread resume, account
budget limits, complete or block goals, and handle external goal
mutations.
- Updated existing goal extension test setup callsites to pass `None`
for metrics when instrumentation is not under test.
## Verification
Not run locally.
Recent composer cleanups split state ownership out of `ChatComposer`,
but slash-command handling still mixed parsing, popup coordination,
completion, submission validation, queue behavior, and argument element
rebasing into the main composer file. Pending changes to slash command
parsing and selection inspired this code move to prevent
`chat_composer.rs` bloat.
This is just a refactor, no functional or behavioral changes are
intended.
## What changed
- Move slash-command parsing and lookup helpers into
`bottom_pane/chat_composer/slash_input.rs`.
- Move slash popup key handling, command-name completion, and popup
construction into the slash input helper module.
- Centralize bare-command, inline-args, submission-validation, and
queued-input action selection behind slash-specific helpers.
- Move command argument text-element rebasing into the slash input
module so inline command submission keeps the same element behavior with
less composer-local logic.
## Verification
- `just fmt`
- `just test -p codex-tui`
- `cargo insta pending-snapshots -p codex-tui`
## Why
The
`approving_apply_patch_for_session_skips_future_prompts_for_same_file`
integration test writes `apply_patch_allow_session.txt` under the
process cwd while exercising outside-workspace patch approval behavior.
With `just test` now being the normal validation path, that file can be
left behind in the checkout when the test runs or fails, creating
confusing untracked state.
## What changed
- Registers the resolved `apply_patch_allow_session.txt` path with
`tempfile::TempPath` before the test removes and recreates it through
`apply_patch`.
- Preserves the existing outside-workspace path shape so the approval
behavior under test does not change.
- Lets `TempPath` remove the generated file when the test exits,
including panic paths.
## Verification
- `just test -p codex-core --test all
approving_apply_patch_for_session_skips_future_prompts_for_same_file`
## Why
`codex.task.compact` only distinguished `local` vs `remote`, which made
it hard to answer simple counter questions in Statsig. Manual `/compact`
and automatic compaction were collapsed together, and the legacy remote
path was also collapsed with `remote_compaction_v2`.
## What Changed
- route `codex.task.compact` through a shared helper in
`core/src/tasks/mod.rs`
- add a `manual=true|false` tag so manual and automatic compaction can
be counted separately
- split the remote tag into `remote` and `remote_v2`
- emit the metric from the inline auto-compaction path in
`core/src/session/turn.rs` as well as the manual `CompactTask` path in
`core/src/tasks/compact.rs`
- add focused unit coverage for the new tag shapes in
`core/src/tasks/mod_tests.rs`
## Verification
- added unit coverage in `core/src/tasks/mod_tests.rs` covering manual
`remote_v2` tags and automatic `local` tags
## Why
Users who opt into named permission profiles through
`default_permissions` or `[permissions.*]` should stay in named-profile
semantics when they open `/permissions`. The legacy picker rewrites
those users into anonymous preset state, which loses the active profile
identity and hides custom configured profiles.
## What changed
- Switch `/permissions` to a profile-aware picker when profile mode is
active.
- Show friendly built-in labels instead of raw `:` profile syntax.
- Include configured custom profiles and their descriptions in the
picker.
- Route selections through the split TUI profile-selection flow below
this PR.
- Add TUI snapshots and regression coverage for built-ins, custom
profiles, and conflicting legacy runtime overrides.
## Stack
1. [#22931](https://github.com/openai/codex/pull/22931):
runtime/session/network propagation for active permission profiles.
2. [#23708](https://github.com/openai/codex/pull/23708): TUI selection
plumbing and guardrail flow.
3. **This PR**: profile-aware `/permissions` menu and custom profile
display.
## UX impact
In profile mode, `/permissions` shows the same human-facing built-ins
users already know:
```text
Default
Auto-review
Full Access
Read Only
locked-down
web-enabled
```
Selecting `locked-down` keeps `active_permission_profile =
Some("locked-down")`; selecting a built-in keeps the friendly label
while switching to its named built-in profile.
## Screenshots
Live `$test-tui` smoke screenshots uploaded through GitHub attachments:
**Profile mode with built-ins and custom profiles**
<img width="832" alt="Profile mode permissions picker with custom
profiles"
src="https://github.com/user-attachments/assets/58b72431-418c-4839-9e39-575076db4c8f"
/>
**Legacy mode remains anonymous preset picker**
<img width="1232" alt="Legacy permissions picker"
src="https://github.com/user-attachments/assets/95f413ab-4cee-411c-9afb-92580a885c97"
/>
<img width="1296" height="906" alt="image"
src="https://github.com/user-attachments/assets/ea381a78-9904-4aa2-828f-b7f2e43f60f2"
/>
<img width="705" height="207" alt="Screenshot 2026-05-18 at 2 58 00 PM"
src="https://github.com/user-attachments/assets/2fa6dd71-0296-449e-a6de-a72d78a1cb70"
/>
## Validation
- `git diff --cached --check` before commit.
- Full test run skipped at the user request while pushing the split
stack.
## Why
The memories extension already has dedicated `list`, `read`, `search`,
and `add_ad_hoc_note` tools, but app-server registration was still
disabled. The memories app collaborator needs an explicit config switch
so those native extension tools can be exposed intentionally, without
making ordinary memory prompt usage automatically register the dedicated
tool surface.
## What changed
- Added `[memories].dedicated_tools`, defaulting to `false`, to
`MemoriesToml` / `MemoriesConfig`.
- Regenerated `core/config.schema.json` for the new setting.
- Registered the memories extension as a `ToolContributor`, while
keeping tool contribution gated on both memories being enabled and
`dedicated_tools = true`.
- Added tests for the disabled default, the enabled dedicated-tools
path, and installer registration.
## Verification
- `just test -p codex-config -p codex-memories-extension`
## Why
Fixes#24502.
`codex resume --include-non-interactive` should include sessions created
by `codex exec`, but the TUI was sending no `sourceKinds` filter to
`thread/list` for that mode. `thread/list` treats omitted or empty
`sourceKinds` as interactive-only (`cli`, `vscode`), so exec sessions
were still filtered out.
## What Changed
- Added a shared TUI `resume_source_kinds` helper so both resume lookup
paths always pass explicit `sourceKinds` to `thread/list`.
- Kept the default resume behavior scoped to `cli` and `vscode`.
- Made `--include-non-interactive` include `exec` and `appServer`
sessions, while continuing to exclude subagent and unknown sources.
## Verification
Added focused coverage for both affected TUI request builders:
- `latest_session_lookup_params_can_include_non_interactive_sources`
- `remote_thread_list_params_can_include_non_interactive_sources`
## Why
The `non_prefixed_mcp_tool_names` feature should be applied where MCP
tools become model-visible, not by remapping names later in core.
Keeping the decision in `McpConnectionManager` construction makes
`ToolInfo` the single shaped view that spec building, deferred tool
search, routing, and unavailable-tool placeholders can consume directly.
This also preserves the existing external behavior while the feature is
off, and keeps the feature-on behavior for code mode and hooks explicit
at the manager boundary.
## What Changed
- Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
into `McpConnectionManager::new`.
- Normalize MCP `ToolInfo` names in the manager using either
legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
adds `mcp__` without restoring the old trailing namespace suffix.
- Remove the core-side MCP name remapping path so specs, tool search,
session resolution, and unavailable-tool placeholder construction use
the manager-provided `ToolName` values directly.
- Keep code mode flattening on the `__` namespace separator.
- Preserve hook compatibility by giving non-prefixed MCP hook names
legacy `mcp__...` matcher aliases.
- Add/adjust integration and unit coverage for non-prefixed code-mode
behavior, hook matching with the feature on and off, and manager-level
legacy prefixing.
## Testing
- `cargo test -p codex-mcp --lib`
- `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
- `cargo test -p codex-core --lib mcp_tools -- --nocapture`
- `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
- `cargo test -p codex-core --test all mcp_tool -- --nocapture`
- `cargo test -p codex-core --test all search_tool -- --nocapture`
- `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
- `cargo test -p codex-core --test all
code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
--nocapture`
- `cargo test -p codex-tools`
- `cargo test -p codex-features`
## Why
`ActiveTurn` already runs at most one task: starting a task requires
that no task is present, and replacement aborts existing work first.
Representing that state as an `IndexMap` leaves a multi-task shape for a
single-task invariant and makes each lifecycle lookup operate like a
collection lookup.
The slot remains optional because goal continuation uses an empty active
turn as a reservation while deciding whether to start continuation work.
## What changed
- Replace `ActiveTurn.tasks` with `task: Option<RunningTask>`.
- Update task abort/completion, session lookup and steering, input-queue
matching, goal reservation, and network-approval lookup to operate on
the singular slot.
- Mutate the singular task slot directly instead of retaining
collection-era add/remove/take helpers.
- Record token usage on the completing active task span without a
regular-task-only opt-in flag.
## Validation
- `cargo test -p codex-core --lib session::tests::steer_input`
- `cargo test -p codex-core --lib
session::tests::abort_empty_active_turn_preserves_pending_input`
- `cargo test -p codex-core --lib
session::tests::queued_response_items_for_next_turn_move_into_next_active_turn`
- `cargo test -p codex-core --lib
session::tests::active_goal_continuation_runs_again_after_no_tool_turn`
- `cargo test -p codex-core --lib
session::tests::abort_regular_task_emits_turn_aborted_only`
- `cargo test -p codex-core --lib session::input_queue::tests`
## Summary
`/mcp` in the TUI should reflect the current loaded thread, including
project-local MCP servers from that thread config. Before this change,
`mcpServerStatus/list` only read the latest global MCP config, so the
active chat could miss project-local servers.
This adds optional `threadId` to `mcpServerStatus/list`. When present,
app-server resolves the loaded thread and lists MCP status from the
refreshed effective config for that thread; when omitted, existing
global config behavior stays unchanged.
The TUI now sends the active chat thread id for `/mcp` and `/mcp
verbose`, carries that origin through the async inventory result, and
ignores stale completions if the user has switched threads before the
fetch returns. The app-server schemas were regenerated.
## Follow-up
Once this app-server API change lands, the desktop app should make the
same `threadId` plumbing so its MCP inventory also uses the current
thread config.
Fixes#23874
## Why
The goal extension already emits `ThreadGoalUpdated` events, but
production app-server thread extensions were built with the default
no-op extension event sink. That meant extension-driven goal updates
could be produced without ever reaching app-server clients.
## What changed
- Build app-server thread extensions with a host-provided
`ExtensionEventSink`.
- Add an app-server sink that converts extension `ThreadGoalUpdated`
events into `ServerNotification::ThreadGoalUpdated` broadcasts.
- Use the existing bounded outgoing message channel via `try_send` so
event forwarding cannot create an unbounded queue.
- Pass `NoopExtensionEventSink` in app-server tests that construct a
`ThreadManager` without an app-server host.
- Refresh `Cargo.lock` for the existing `codex-memories-extension`
`codex-otel` dependency.
## Verification
- `just test -p codex-app-server
extensions::tests::app_server_event_sink_forwards_thread_goal_updates`
## Why
The memories extension now receives a metrics exporter, but the useful
extension-owned signal is the memory tool call itself: which operation
ran, which memory area it touched, whether the backend call succeeded,
and whether the result was truncated.
## What changed
- Added the `codex.memories.tool.call` counter in
`ext/memories/src/metrics.rs`.
- Emit that counter from `memories/add_ad_hoc_note`, `memories/list`,
`memories/read`, and `memories/search` after backend execution.
- Tag each call with `tool`, `operation`, `scope`, `status`, and
`truncated`.
- Pass the existing `MetricsClient` through the memories extension into
the tool executors; tests use `None`.
## Verification
- `just test -p codex-memories-extension`
## Summary
- let the memories extension capture the process-global OTEL metrics
client at install time
- keep app-server/TUI/exec extension construction APIs unchanged
- store the metrics client for future memory metrics without emitting
any metrics yet
## Test plan
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`
- Not run: tests/clippy per request; CI will cover them
## Why
Codex memory updates currently rely on instructions that tell agents to
create ad-hoc note files directly in the memory workspace. The memories
extension already has a `MemoriesBackend` abstraction for local storage
and future non-filesystem backends, so the ad-hoc note writer should
live behind that same interface instead of baking local filesystem
assumptions into the tool shape.
## What
- Adds a `memories/add_ad_hoc_note` tool to the existing memories tool
bundle.
- Extends `MemoriesBackend` with `add_ad_hoc_note` plus request/response
types so remote memory stores can implement the same operation later.
- Implements the local backend by creating append-only notes under
`extensions/ad_hoc/notes`.
- Validates the tool-provided filename contract
(`YYYY-MM-DDTHH-MM-SS-<slug>.md`), rejects path-like filenames, rejects
empty notes, and uses create-new semantics so existing notes are never
overwritten.
- Keeps memories tool contribution behind the existing commented-out
registration path; this defines the tool surface without newly exposing
it through app-server.
## Test Plan
- `just test -p codex-memories-extension`
## Why
The memories extension now owns the read-path developer instructions it
injects at thread start. Keeping that prompt builder and template in
`codex-memories-read` left the extension depending on a helper crate for
extension-specific prompt assembly, and kept async template/truncation
dependencies in the read crate after the remaining read surface no
longer needed them.
## What changed
- Moved `prompts.rs`, its tests, and `templates/memories/read_path.md`
from `memories/read` into `ext/memories`.
- Wired `MemoryExtension` to call the local prompt builder and added the
moved templates to `ext/memories/BUILD.bazel` compile data.
- Removed the now-unused prompt export and prompt-related dependencies
from `codex-memories-read`.
## Testing
- Not run locally.
## Why
The memory read-tool surface had two implementations: the app-server
extension path under `ext/memories`, and an unused `codex-memories-mcp`
workspace crate under `memories/mcp`. The MCP crate no longer has
reverse dependents, so keeping it around preserves duplicate backend,
schema, and tool code that is not part of the live app-server memory
path.
Dropping the orphaned crate makes the remaining memory crate split
clearer: `memories/read` owns read-path prompt/citation helpers,
`memories/write` owns the write pipeline, and `ext/memories` owns the
app-server extension integration.
## What changed
- Removed the `memories/mcp` crate and its Bazel/Cargo metadata.
- Removed `memories/mcp` from the Rust workspace and lockfile.
- Updated `memories/README.md` so it only lists the remaining reusable
memory crates.
## Verification
- `cargo metadata --format-version 1 --no-deps` succeeds.
## Why
#23951 added remote compaction v2 retries, but it left the retry and WS
-> HTTPS fallback behavior duplicated between normal Responses turns and
compaction. This follow-up centralizes the common retry handling so
future changes to fallback, retry delay, retry notifications, and retry
sleep do not have to be kept in sync across both callsites.
## What changed
- Added `core/src/responses_retry.rs` with a shared handler for
retryable Responses stream errors.
- Reused that handler from normal turn sampling and remote compaction
v2.
- Kept each callsite responsible for its retry budget: normal turns
still use `stream_max_retries`, while compaction v2 still uses
`min(stream_max_retries, 2)`.
- Preserved caller-specific behavior around non-retryable errors,
context-window errors, usage-limit errors, and compact-specific final
failure logging.
The shared handler now owns:
- WS -> HTTPS fallback warning emission
- retry delay selection, including server-requested stream retry delay
- retry logging
- first-WebSocket-retry notification suppression
- `Reconnecting... n/max` stream-error notification
- sleeping before the next retry attempt
## Verification
- `cargo test -p codex-core remote_compact_v2`
- `cargo test -p codex-core websocket_fallback`
- `just fix -p codex-core`
Did not run the full workspace test suite.
---------
Co-authored-by: jif-oai <jif@openai.com>
## Why
The old config-profile mechanism should no longer influence runtime
behavior now that profile selection has moved to file-based `--profile`
config files. Core already rejects a selected legacy `profile = "..."`
with a migration error in
[`core/src/config/mod.rs`](d6451fcb79/codex-rs/core/src/config/mod.rs (L2521-L2529)),
but a few residual consumers still read legacy `[profiles.*]` data while
performing managed-feature checks and personality migration.
That kept dead legacy profile state relevant after selection had been
removed, and could make personality migration depend on a stale or
missing old profile.
## What changed
- Stop scanning legacy `[profiles.*]` feature settings when validating
managed feature requirements.
- Make personality migration consider only top-level `personality` and
`model_provider` settings.
- Remove the now-unused `ConfigToml::get_config_profile` helper.
- Update personality migration coverage to verify that legacy profile
personality fields and missing legacy profile names no longer affect
that migration path.
This keeps the legacy `profile` / `profiles` config shape available for
the remaining compatibility and migration diagnostics; it only removes
these behavior consumers.
## Verification
- Updated `core/tests/suite/personality_migration.rs` for the new
legacy-profile behavior.
- Focused test command: `cargo test -p codex-core
personality_migration`.
## Why
Refs #24425.
We have seen rollout JSONL corruption that appears consistent with a
rollout write failing after partially appending a line, followed by a
retry that appends the same item again. The available user logs did not
include the underlying OS error, so it is hard to tell whether the
trigger was `ENOSPC`, quota exhaustion, a filesystem error, or something
else.
This PR adds the missing diagnostics for future reports.
## What changed
- Include `ErrorKind` and `raw_os_error()` in rollout writer failure
logs.
- Preserve the existing append-only rollout write path; this PR is
diagnostic-only.
## Verification
- `just test -p codex-rollout`
## Summary
Follow-up to #24459 and partial behavioral revert of `a71fc47` / #16699.
- Stop removing `MallocStackLogging*` and `MallocLogFile*` from macOS
pre-main hardening.
- Remove documentation that claims Codex suppresses those allocator
diagnostic controls.
- Retain the shared `remove_env_vars_with_prefix` refactor and existing
`LD_` / `DYLD_` hardening.
## Why
#24459 fixes the composer-corruption problem at the terminal stderr
boundary while preserving redirected stderr. With that guard in place,
stripping macOS malloc diagnostic settings is unnecessary and can hide
diagnostics intentionally enabled by callers.
## Validation
- `just fmt`
- `just test -p codex-process-hardening`
- `just argument-comment-lint-from-source -p codex-process-hardening`
- `git diff --check`
## Why
Fixes#17139.
On macOS, runtime diagnostics such as `MallocStackLogging` messages can
be written directly to process stderr while the inline TUI owns the
terminal. Those bytes paint into the same viewport as the composer
without passing through the renderer or composer state, making
diagnostic output appear to leak into the input area.
## What Changed
- Add a macOS terminal stderr guard while the inline TUI owns the
viewport.
- Restore stderr when Codex returns terminal ownership for external
interactive programs, suspend/resume, panic handling, and normal
shutdown.
- Add an fd-level regression test that verifies output is suppressed
only while terminal ownership is held and restored at each handoff
boundary.
## How to Test
1. On macOS, launch the interactive TUI and leave the composer visible.
2. Exercise the workflow that triggers an allocator/runtime stderr
diagnostic during an active session, as reported in #17139.
3. Confirm the diagnostic no longer overwrites the active composer
region.
4. Suspend or exit the TUI and confirm subsequent terminal stderr output
remains visible.
The platform diagnostic is environment-dependent, so the deterministic
regression check is the new fd-lifecycle test in
`tui::terminal_stderr::tests::suppresses_stderr_only_while_terminal_is_owned`.
Targeted validation:
- `just argument-comment-lint-from-source -p codex-tui` passed.
- `just test -p codex-tui` exercised and passed the new stderr-guard
regression test. The full invocation currently fails in two unrelated
guardian-policy tests,
`update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
and
`update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`,
which reproduce when rerun in isolation.
## Why
Numbered Markdown findings become hard to scan when long items visually
run together or when wrapped explanatory paragraphs lose their list
indentation. This is especially visible in review output: the next
number can look attached to the previous finding, and paragraph
continuation rows can jump back toward the left margin instead of
staying grouped beneath their item.
<table><tr><td>
<center>Before</center>
<img width="1718" height="836" alt="CleanShot 2026-05-24 at 14 00 49"
src="https://github.com/user-attachments/assets/f1ee0023-50fa-4f81-a641-ae08b17b99bd"
/>
</td></tr>
<tr><td>
<center>After</center>
<img width="1714" height="906" alt="image"
src="https://github.com/user-attachments/assets/b123a5e0-a232-47bf-96d5-c935295f7c0a"
/>
</td></tr>
</table>
## What Changed
- Insert a blank separator before a sibling list item when the previous
item occupies more than one rendered line.
- Preserve compact rendering for lists whose sibling items each render
on one line.
- Preserve list-body leading whitespace when transient streamed
assistant rows require another wrapping pass for history display, so
wrapped paragraphs stay aligned beneath their item.
- Share the existing leading-whitespace prefix logic used by history
insertion instead of introducing a second indentation rule.
- Keep streamed Markdown output aligned with completed rendering and add
snapshots for findings-style spacing and streamed paragraph indentation.
## How to Test
1. Start Codex from this branch and open the recorded repro session
`019e563f-7d58-7ff2-8ec7-828f20fa61ca`.
2. Inspect the numbered `Findings` list whose items contain explanatory
paragraphs.
3. Confirm each multiline finding is separated from the next numbered
finding by one blank line.
4. Confirm wrapped rows of each indented paragraph remain aligned
beneath the finding body, rather than returning to the left edge.
5. Render a short one-line numbered or unordered list and confirm its
items remain compact without added blank rows.
Targeted tests:
- `just test -p codex-tui history_cell insert_history markdown_render
markdown_stream streaming::controller`
- `just argument-comment-lint-from-source -p codex-tui`
## Related Work
PR #24346 changes Markdown table column allocation in parallel. This PR
is intentionally limited to list-item readability and history wrapping;
both branches touch `codex-rs/tui/src/markdown_render.rs`, so a small
merge conflict may need resolution depending on merge order.
## Why
Markdown tables with a long path-heavy column could allocate almost all
available width to that column and collapse neighboring prose columns to
only a few characters. In rollout summaries this made `Unit` and `What
It Adds` difficult to read, even though the long `Files` values were the
content best suited to wrapping.
The affected example also specified `Files` as right aligned in its
markdown delimiter (`---:`). This change preserves that requested
alignment while improving how width is distributed.
| Before | After |
|---|---|
| <img width="1709" height="764" alt="image"
src="https://github.com/user-attachments/assets/932ab21c-b72d-48a2-9aad-b69da87a0968"
/> | <img width="1711" height="855" alt="image"
src="https://github.com/user-attachments/assets/4028bd20-2228-4c2f-be8a-1866325b7f62"
/> |
## What Changed
- Classify table columns as narrative, token-heavy, or compact during
width allocation.
- Shrink token-heavy path and URL columns before shrinking narrative
prose, while preserving compact counts and short labels longest.
- Use readable soft floors for narrative and token-heavy content before
falling back to tighter layouts.
- Add snapshot coverage for a rollout-shaped table containing
right-aligned file paths and prose columns.
## How to Test
1. Render a markdown table with `Unit`, right-aligned `Files`, `Adds`,
`Removes`, and `What It Adds` columns at a constrained terminal width.
2. Put long repository paths in `Files` and sentence-length content in
`Unit` and `What It Adds`.
3. Confirm that `Files` remains right aligned but wraps before the
narrative columns become unreadable.
4. Confirm that the compact numeric columns remain easy to scan.
Targeted tests:
- `just test -p codex-tui markdown_render`
Validation note: `just test -p codex-tui` was also attempted and reached
two existing unrelated failures in
`app::tests::update_feature_flags_disabling_guardian_*`; the markdown
rendering regression test passes in the targeted run.
## Why
Users have been reporting missing sessions in the app. The app server
thread listing is backed by the SQLite state DB, but the durable source
of truth for a thread still exists on disk as rollout JSONL. When the
state DB is incomplete, doctor should be able to show the mismatch
directly instead of leaving users with a generic state health result.
## What changed
This adds a `threads` doctor check that compares active and archived
rollout files under `CODEX_HOME` with rows in the SQLite `threads`
table. The check reports missing rollout rows, stale DB rows, archive
flag mismatches, duplicate rollout thread IDs, duplicate DB paths,
source/provider summaries, and bounded samples of affected rollout
paths.
It also adds a read-only state audit helper in `codex-rs/state` so
doctor can inspect thread rows without creating, migrating, or repairing
the database.
## Sample output
```text
⚠ threads rollout files are missing from the state DB
default model provider openai
rollout DB active files 3910
rollout DB archived files 2037
rollout DB scan errors 0
rollout DB malformed file names 0
rollout DB scan cap reached false
rollout DB rows 5499
rollout DB active rows 3462
rollout DB archived rows 2037
rollout DB missing active rows 448
rollout DB missing archived rows 0
rollout DB stale rows 0
rollout DB archive mismatches 0
rollout DB duplicate rollout thread ids 0
rollout DB duplicate DB paths 0
rollout DB model providers openai=5359, lmstudio=35, mock_provider=33, lite_llm=26, proxy=26, ollama=15, lms=4, local-usage-limit=1
rollout DB sources vscode=2587, cli=1494, subagent:thread_spawn=577, subagent:other=502, exec=281, subagent:memory_consolidation=46, subagent:review=9, unknown=3
rollout DB missing active sample ~/.codex/sessions/2026/0…857e-a923c712e066.jsonl
rollout DB missing active sample ~/.codex/sessions/2025/0…877a-766dff25c68d.jsonl
rollout DB missing active sample ~/.codex/sessions/2025/0…a8b1-7bbadc836f6e.jsonl
rollout DB missing active sample ~/.codex/sessions/2025/0…a218-e6197f3f62f8.jsonl
rollout DB missing active sample ~/.codex/sessions/2025/0…9011-7e30784f9932.jsonl
```
## Summary
The TUI `/mcp` inventory flow should reflect the app server’s MCP status
response. It was also joining those results with the TUI process’s local
`config.mcp_servers`, which can diverge once MCP state is owned by a
remote app server and cause stale local command, URL, status, or
empty-state details to render.
This change removes the local config join from the app-server-backed
inventory renderer. The TUI now renders directly from the existing
`mcpServerStatus/list` payload and treats an empty status response as
the empty MCP inventory state.
## Known limitation
The existing `mcpServerStatus/list` payload does not include
disabled-state or disabled-reason fields. To preserve the current
app-server API, this PR does not try to infer that state from
client-local config. If remote `/mcp` needs to show disabled/reason
details again, that should come from app-server-owned status data in a
follow-up.
Related to #22914, #22915, and #22916.
## Why
TUI onboarding trusted-project persistence should go through the same
app-server config write path as other config mutations. Writing
`config.toml` directly from the trust widget bypasses that layer and can
let onboarding proceed even when the trust decision was not actually
persisted.
## What changed
- Added a TUI config helper that writes the existing project trust
structure through `config/batchWrite`.
- Persists trust decisions as `projects.<project>.trust_level =
"trusted"` using the existing project trust key helper.
- Changed the trust directory widget to only record the user selection;
onboarding performs the app-server write before reporting success.
- Keeps the user on the trust screen and shows an error if app-server
persistence fails.
## Verification
- `cargo test -p codex-tui --lib
trust_persistence_failure_keeps_trust_step_in_progress`
- `cargo test -p codex-tui --lib
trusted_project_edit_targets_project_trust_level`
- Manual: built the local `codex-cli`, accepted the trust prompt in a
temp project, confirmed `projects.<project>.trust_level = "trusted"`,
and simulated an unwritable config to verify onboarding stays on the
trust screen without writing trust.
## Summary
Manual provider selection during `codex --oss` startup was still
persisting `oss_provider` through the legacy local `config.toml` writer.
That bypasses the app-server-owned config mutation path used by the TUI,
so this routes the write through the app server config API instead.
The net behavior is intentionally narrow: only an interactive picker
selection is persisted. Auto-detected single-running-provider startup
and explicit `--local-provider` startup remain ephemeral, so merely
having one backend running does not make that provider sticky for future
runs.
## What Changed
- Removed the TUI picker’s direct dependency on
`set_default_oss_provider`.
- Had `oss_selection` report whether the returned provider came from the
interactive picker.
- Carried only manually selected providers into startup persistence.
- Wrote `oss_provider` via `config/batchWrite` once the app server
session is available.
- Logged a warning and continued startup if the app-server config write
fails.
## Verification
Manually smoke-tested the real `codex-tui` binary with a temporary
`CODEX_HOME`, pseudo-terminal input, and a fake LM Studio HTTP server:
- Interactive picker selection persisted `oss_provider = "lmstudio"`.
- Non-picker `--local-provider lmstudio` startup did not persist
`oss_provider`.
Fixes#24093.
## Why
`--dangerously-bypass-hook-trust` is a supported CLI flag intended for
headless or automated runs where enabled hooks should be allowed to run
without requiring persisted trust. In the TUI, startup hook review still
opened whenever hooks looked untrusted, so a launch using the bypass
could block on the interactive "Hooks need review" prompt.
The tricky case is persistent app-server resume: a resume may attach to
an already-running thread, where resume config overrides are ignored. In
that path, hiding the startup review would be wrong because the existing
hook engine may still filter untrusted hooks.
## What Changed
- Startup hook review now skips the prompt only when hook trust bypass
is actually safe for that launch.
- The TUI forwards `bypass_hook_trust` through the app-server request
config for fresh thread start/resume/fork paths, and the app-server
applies it as a runtime-only `ConfigOverrides` value rather than
treating it like a `config.toml` setting.
- Persistent app-server resumes keep the startup review prompt so users
still have a chance to trust hooks when the running thread cannot
receive the bypass override.
## Verification
- Added focused coverage for startup hook review with and without
`bypass_hook_trust`.
- Extended existing TUI/app-server config override tests to cover
forwarding and applying `bypass_hook_trust`.
## Summary
Fixes#24411.
`/status` currently has no way to show when the TUI is talking to Codex
through a remote transport. That makes embedded local sessions, local
daemon sessions, and true remote sessions look the same, and it hides
the remote server version when debugging connection-specific behavior.
This PR adds a single `Remote` row for non-embedded connections only.
The row shows the sanitized connection address and a dimmed version
parenthetical, preserving the existing status output for embedded local
sessions.
<img width="791" height="144" alt="image"
src="https://github.com/user-attachments/assets/529d7940-1c45-4586-8b06-f20a1f04b771"
/>
## Verification
- Manually validated when connecting remotely (either implicitly to
local daemon or explicitly)
## Summary
The compact TUI status line already renders rate-limit percentages as
remaining capacity, but the text did not say so. That made high-usage
red indicators ambiguous because values like `weekly 6%` could be read
as either used or remaining.
This PR labels the compact rate-limit values explicitly as `left` across
the status line, terminal title, and setup previews.
Addresses #24274
## Why
We are seeing cases where users have an old background app-server still
running. `codex doctor` already reports background server state, but
without the running app-server version it is harder to diagnose
behaviors that depend on the daemon build.
## What changed
- Reused the app-server daemon's passive initialize probe through a
narrow `probe_app_server_version` helper.
- Updated the `codex doctor` Background Server section to report
`app-server version: <version>` when the socket is reachable.
- Preserved the not-running OK behavior and report `app-server version:
unavailable (<short error>)` when a socket exists but the passive probe
fails.
## Why
Issue #23031 was hard to diagnose from existing `codex doctor` output
because support could not see the OS language, resolved Git install, Git
repo metadata, Windows console mode/code page, or terminal-title inputs
that affect the TUI startup path. This adds those read-only signals to
`codex doctor` so Windows, Linux, and macOS reports carry the context
needed to investigate similar terminal rendering regressions.
Refs #23031
## What Changed
- Add a `system.environment` check for OS type/version, OS language, and
locale env vars.
- Add a `git.environment` check for the selected Git executable, PATH
Git candidates, version, exec path/build options, repository root,
branch, `.git` entry, and `core.fsmonitor`.
- Add Windows console code page and VT-processing mode details to
terminal diagnostics.
- Add a `terminal.title` check for configured/default title items and
resolved project-title source/value.
- Surface startup warning counts in config diagnostics and teach human
output to render the new categories.
## How to Test
1. On Windows, check out this branch and run `cargo run -p codex-cli --
doctor --summary`.
2. Confirm the Environment section includes `system`, `git`, `terminal`,
and `title` rows.
3. Run `cargo run -p codex-cli -- doctor --json`.
4. Confirm the JSON contains `system.environment`, `git.environment`,
and `terminal.title`; on Windows, confirm `terminal.env` details include
console code pages and `VT processing` for stdout/stderr.
5. From a non-git directory, run the same `doctor --json` command and
confirm the Git check reports `repo detected: false` rather than
warning.
Targeted tests:
- `cargo test -p codex-cli doctor`
- `cargo test -p codex-cli`
Move plugin tar.gz packing and unpacking into a shared core-plugins
archive helper so uploaded bundles are decoded through the same tar
handling used for installs. This removes duplicate archive logic,
supports GNU long-name entries on extraction, and keeps size, traversal,
link, and entry-type checks in one place.
## Summary
Change code-mode stored value updates to merge writes by key instead of
replacing the session's complete stored-value map after each cell
completes.
Previously, each cell received a snapshot of stored values and returned
the complete resulting map. When multiple cells ran concurrently, a
later completion could overwrite values written by another cell because
it committed an older snapshot.
This change moves stored-value ownership into `CodeModeService`:
- Each runtime starts from the service's current stored values.
- Runtime completion reports only keys written by that cell.
- The service merges those writes into the current stored-value map on
successful completion.
- Core no longer replaces its stored-value state from a cell result.
As a result, concurrently executing cells can update different stored
keys without clobbering one another.
The move into CodeModeService is motivated by a desire to have this
lifetime tied to a new lifetime object on that side in a subsequent PR.
# Why
`PreToolUse`, `PostToolUse`, and `updatedInput` coverage for local
function tools currently depends on each handler remembering to wire up
the hook contract itself. That makes coverage easy to miss as new
function tools are added, even though most of them share the same basic
shape: a model-facing function call with JSON arguments.
# What
This makes `CoreToolRuntime` provide the default hook contract for
ordinary local function tools:
- build generic `PreToolUse` and `PostToolUse` payloads from the
function tool name and arguments
- apply `updatedInput` rewrites back into function-tool arguments
through the same default path
- let tool outputs override the post-hook input or response when they
have a more stable hook-facing contract
The exceptions stay explicit:
- hosted tools remain outside the generic local function path
- code-mode `wait` and `write_stdin` opt out for now
- `PostToolUse` feedback replaces only the model-visible response, so
code mode keeps its typed tool result
With the generic path in place, the MCP and extension-tool adapters no
longer need their own duplicate pre/post hook plumbing. The new coverage
exercises the registry default plus end-to-end local function behavior
for pre-hook blocking, `updatedInput` rewriting, and post-hook context.
## Why
The package layout gives Codex a stable place for runtime helpers that
should travel with the entrypoint. `shell_zsh_fork` still required users
to configure `zsh_path` manually, even though we already publish
prebuilt zsh fork artifacts.
This PR builds on #24129 and uses the shared DotSlash artifact fetcher
to include the zsh fork in Codex packages when a matching target
artifact exists. Packaged Codex builds can then discover the bundled
fork automatically; the user/profile `zsh_path` override is removed so
the feature uses the package-managed artifact instead of a legacy path
knob.
## What Changed
- Added `scripts/codex_package/codex-zsh`, a checked-in DotSlash
manifest for the current macOS arm64 and Linux zsh fork artifacts.
- Taught `scripts/build_codex_package.py` to fetch the matching zsh fork
artifact and install it at `codex-resources/zsh/bin/zsh` when available
for the selected target.
- Added package layout validation for the optional bundled zsh resource.
- Added `InstallContext::bundled_zsh_path()` and
`InstallContext::bundled_zsh_bin_dir()` for package-layout resource
discovery.
- Threaded the packaged zsh path through config loading as the runtime
`zsh_path` for packaged installs, and removed the config/profile/CLI
override path.
- Kept the packaged default zsh override typed as `AbsolutePathBuf`
until the existing runtime `Config::zsh_path` boundary.
- Updated app-server zsh-fork integration tests to spawn
`codex-app-server` from a temporary package layout with
`codex-resources/zsh/bin/zsh`, matching the new packaged discovery path
instead of setting `zsh_path` in config.
- Switched package executable copying from metadata-preserving `copy2()`
to `copyfile()` plus explicit executable bits, which avoids macOS
file-flag failures when local smoke tests use system binaries as inputs.
## Testing
To verify that the `zsh` executable from the Codex package is picked up
correctly, first I ran:
```shell
./scripts/build_codex_package.py
```
which created:
```
/private/var/folders/vw/x2knqmks50sfhfpy27nftl900000gp/T/codex-package-pms94kdp/
```
so then I ran:
```
/private/var/folders/vw/x2knqmks50sfhfpy27nftl900000gp/T/codex-package-pms94kdp/bin/codex exec --enable shell_zsh_fork 'run `echo $0`'
```
which reported the following, as expected:
```
/private/var/folders/vw/x2knqmks50sfhfpy27nftl900000gp/T/codex-package-pms94kdp/codex-resources/zsh/bin/zsh
```
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23756).
* #23768
* __->__ #23756
## Why
Remote-control websocket reconnects currently use the shared exponential
backoff helper without a local ceiling, so a long failure streak can
stretch retries out indefinitely and leave the runtime behavior hard to
inspect from logs.
## What Changed
Cap the remote-control reconnect delay at 30 seconds, then reset the
reconnect attempt counter once that capped delay is emitted so the next
failure starts from the initial jittered delay again.
The reconnect failure log now records the attempt number, chosen delay,
and whether the cap triggered a reset, with a separate info log when the
backoff counter is reset after the cap.
## Verification
`just test -p codex-app-server-transport`
Related issue: N/A
## Why
The zsh release workflow currently publishes macOS arm64 and Linux zsh
fork artifacts, but no macOS x64 artifact. The Codex package builder
therefore cannot include codex-resources/zsh/bin/zsh for
x86_64-apple-darwin packages.
## What Changed
- Added an x86_64-apple-darwin row to the macOS zsh release matrix.
- Runs that row on macos-15-large, the Intel macOS runner appropriate
for the native zsh build.
- Added the matching macos-x86_64 platform to the zsh DotSlash publish
config so the generated release manifest can reference the new tarball.
## Why
`openai/openai#947613` adds `X-Codex-Rate-Limit-Reached-Type` for Codex
workspace credit-depletion and spend-cap responses. The CLI currently
reads the adjacent promo header but otherwise renders generic
usage-limit copy, so those responses do not explain the
workspace-specific action the user needs to take.
Backend dependency: https://github.com/openai/openai/pull/947613
## What Changed
- Parse `X-Codex-Rate-Limit-Reached-Type` in the usage-limit error
handling path alongside `x-codex-promo-message`.
- Keep the header value parsing with the shared `RateLimitReachedType`
enum.
- Carry the parsed type on `UsageLimitReachedError` and render
client-owned copy for the four workspace owner/member credit and
spend-cap values.
- Preserve existing promo and plan-based text for absent, generic, or
unknown header values.
- Keep the existing TUI workspace-owner nudge state path unchanged; the
response header only selects the displayed error string.
- Add focused display coverage for all specific type values and the
generic fallback case.
## Test Plan
- Added `usage_limit_reached_error_formats_rate_limit_reached_types`
coverage.
- Not run manually, per request; CI runs validation on the pushed
commit.
## Why
The turn loop no longer needs to decide when a `ModelClientSession`
should reset its websocket state after compaction. That reset behavior
belongs inside the model client, where the websocket cache and retry
state are owned. The repo guidance now calls this out explicitly so
future changes let the incremental request logic decide whether the
previous request can be reused.
## What Changed
- Removed the `reset_client_session` return value from pre-sampling and
auto-compact helpers in `core/src/session/turn.rs`.
- Changed compaction helpers to return `CodexResult<()>` so callers only
handle success or failure.
- Made `ModelClientSession::reset_websocket_session` private to
`core/src/client.rs`, leaving it callable only from model-client
internals.
- Added `AGENTS.md` guidance not to call `reset_client_session`
unnecessarily.
## Validation
- `just test -p codex-core session::turn`
## Why
Before changing the Codex Bridge JSON schema policy, add integration
coverage around real connector-like MCP tool schemas. The existing unit
tests cover individual sanitizer behaviors, but they do not make it easy
to see whether full fixture schemas keep model-visible guidance, prune
only unreachable definitions, drop unsupported JSON Schema fields, and
stay within the Responses API schema budget.
## What Changed
- Added `tools/tests/json_schema_policy_fixtures.rs`, which converts MCP
tool fixtures through `mcp_tool_to_responses_api_tool` and validates the
resulting Responses tool parameters.
- Added connector-style fixtures for Slack, Google Calendar, Google
Drive, Notion, and Microsoft Outlook Email under
`tools/tests/fixtures/json_schema_policy/`.
- Added fixture assertions for preserved guidance, pruned definitions,
expected field drops after `JsonSchema` conversion, marker count
baselines, and dangling local `$ref` prevention.
- Added a real oversized golden Notion `create_page` input schema
fixture to exercise the compaction path that strips descriptions, drops
root `$defs`, rewrites local refs, and fits the compacted schema under
the budget.
## Summary
- add Divan benchmarks for prompt image re-encoding paths
- wire the image benchmark smoke test into Rust CI workflows
## Why
Image prompt handling includes re-encoding work that benefits from
repeatable benchmark coverage so changes can be measured in CI and
locally.
This already helped identify a potential regression from changing compiler flags.
## Impact
Developers can run and compare the new image re-encoding benchmarks, and
CI exercises the benchmark target via the Rust benchmark smoke test.
## Why
The idea here is to erase the difference between initial and followup
inputs to a turn. Followup inputs are already represented as TurnInput.
Eventual goal is not to have explicit on task input at all and pull
everything from input Q.
## What Changed
- Changes `SessionTask::run` and the erased `AnySessionTask::run` path
to accept `Vec<TurnInput>`.
- Wraps user-submitted spawn input as `TurnInput::UserInput` at the
session task start boundary.
- Updates `run_turn` to record initial `TurnInput` using the same hook
and recording path used for pending input.
- Keeps review-specific conversion local to `ReviewTask`, where the
sub-Codex one-shot API still expects `Vec<UserInput>`.
- Moves the synthetic compact prompt into `CompactTask` and starts
compact tasks with empty task input.
## Validation
- `cargo check -p codex-core`
- `just test -p codex-core -E
'test(task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input)
| test(queued_response_items_for_next_turn_move_into_next_active_turn) |
test(steered_input_reopens_mailbox_delivery_for_current_turn)'`
## Why
The package builder already fetches `rg` from a checked-in DotSlash
manifest. The zsh packaging work needs the same
fetch/cache/size-check/SHA-256/extract path for another manifest, but
keeping that refactor inside the zsh PR makes the review harder to
follow.
This PR factors the existing `rg`-specific implementation into a
reusable helper with no intended behavior change for `rg` packaging.
## What Changed
- Added `scripts/codex_package/dotslash.py` for checked-in DotSlash
manifest parsing, archive download, cache reuse, size validation,
SHA-256 validation, and member extraction.
- Updated `scripts/codex_package/ripgrep.py` to delegate to the shared
helper.
- Preserved the existing `rg` manifest path, cache key, destination
filename, and executable-bit behavior.
## Testing
- `python3 -m py_compile scripts/codex_package/dotslash.py
scripts/codex_package/ripgrep.py scripts/codex_package/cli.py
scripts/codex_package/layout.py scripts/codex_package/zsh.py`
- `python3 -m unittest discover scripts/codex_package`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24129).
* #23768
* #23756
* __->__ #24129
## What changed
- Add a distinct `responses_compaction_v2` value for
`CodexCompactionEvent.implementation`.
- Emit that value from the remote compaction v2 path.
- Keep local compaction as `responses` and legacy `/responses/compact`
as `responses_compact`.
## Why
Remote compaction v2 and local prompt-based compaction were both
reported as `responses`, which made the analytics table collapse two
different compaction mechanisms into one implementation bucket.
## Validation
- `just fmt`
- `just test -p codex-analytics`
`just test -p codex-core` was started locally, but this PR is
intentionally being pushed for CI to finish the remaining validation.
## Why
Standalone image generation needs a typed `codex-api` client surface for
the Codex image proxy routes before the harness and model-facing tool
layers are wired in.
## What changed
- Added `ImagesClient` support for JSON `images/generations` and
`images/edits` requests.
- Added typed request and response shapes for generation, JSON edit
image URLs, image metadata, and base64 image outputs.
- Kept generation model slugs open-ended while requiring the generation
model field that the downstream endpoint expects.
- Exported the new client and image types from `codex-api`.
- Added coverage for generation and edit wire shapes, extra response
metadata that the client ignores, and malformed image responses missing
`data`.
## Validation
- `cargo test -p codex-api`
- `just fix -p codex-api`
- `just fmt`
- `git diff --check main`
## Summary
- add `--oauth-client-id` and `--oauth-resource` options for streamable
HTTP `codex mcp add` registrations
- persist those options in MCP server config and use them during the
immediate OAuth login flow
- cover add-time serialization of both OAuth options in the CLI
integration tests
## Testing
- `just fmt`
- `cargo test -p codex-cli`
- `just fix -p codex-cli`
## Why
[Recent PR](https://github.com/openai/codex/pull/22709) removed
`trace_id` from `TurnContextItem`.
## What changed
- Add to `TurnStartedEvent` so rollout consumers can correlate turns
with telemetry traces.
- Note that the branch name is out of date because I originally re-added
to `TurnContextItem`, but we decided to move it to `TurnStartedEvent`.
## Verification
- `cargo test -p codex-protocol`
- `cargo test -p codex-core --lib
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core --test all
emits_warning_when_resumed_model_differs`
- `cargo test -p codex-rollout`
- `cargo test -p codex-state`
## Why
`codex sandbox` now always runs the host sandbox backend, so it should
accept the same profile selection mechanism as the rest of the runtime
CLI surface. Without `--profile`, sandbox debugging can exercise only
the default config stack unless users manually translate profile config
into ad hoc `-c` overrides.
Supporting `--profile` lets sandbox invocations load
`$CODEX_HOME/<name>.config.toml`, including permission profile
configuration, before resolving the sandbox policy for the command being
run.
## What Changed
- Added `--profile NAME` / `-p NAME` to the host-specific `codex
sandbox` argument structs as `config_profile`.
- Allowed root-level `codex --profile NAME sandbox ...` and made a
sandbox-local `codex sandbox --profile NAME ...` override the root
selection.
- Threaded `LoaderOverrides` through sandbox config loading so selected
config profile files participate in permission resolution before the
legacy read-only fallback.
- Documented the new sandbox flag in `codex-rs/README.md`.
## Verification
- Added parser coverage for `codex sandbox --profile`.
- Added sandbox config-loader coverage that verifies selected config
profile loader overrides select the profile config rather than falling
back to read-only.
- Ran `cargo test -p codex-cli`.
## Why
Older Git for Windows versions can leave the Windows console output mode
without virtual terminal processing after Codex runs git metadata
commands in a repository. When the TUI later emits ANSI control
sequences for redraws, restore, or image rendering, Windows Terminal can
show raw escape bytes or leave the prompt/status area corrupted.
This is a targeted mitigation for the repo-conditioned Windows rendering
corruption reported in #23888 and related reports #23512 and #23628.
Updating Git avoids the trigger for affected users, but Codex should
also reassert the terminal mode before it writes TUI control sequences.
| Before | After |
|---|---|
| <img width="2100" height="1359" alt="CleanShot 2026-05-22 at 11 23 21"
src="https://github.com/user-attachments/assets/3218c379-5f97-4c71-ab25-805c9d20578a"
/> | <img width="2100" height="1359" alt="CleanShot 2026-05-22 at 11 23
58"
src="https://github.com/user-attachments/assets/55ac72bb-37d0-400e-99bc-12dd5ea4092d"
/> |
## What Changed
- Re-enable Windows virtual terminal processing for stdout and stderr
before TUI mode setup, restore, redraw, resume, and pet image render
paths.
- Treat invalid, null, or non-console handles as no-ops so redirected or
non-console output is unaffected.
- Keep the helper as a no-op on non-Windows platforms.
## How to Test
1. On Windows Terminal with a Git 2.28.0 for Windows install, start
Codex inside a valid Git repository.
2. Start a new Codex CLI session.
3. Confirm the prompt, working indicator, and bottom status line remain
readable instead of showing raw ANSI escape sequences.
4. Repeat outside a Git repository to confirm the ordinary non-repo
startup path is unchanged.
Targeted tests:
- Not run locally; the behavior depends on Windows console mode APIs and
the current worktree is on macOS.
Now that users can install via `curl` (or `irm`), we should tell them
about it so they no longer need to use `npm`!
Note that on one Windows machine I tested on, when I ran:
```
irm https://chatgpt.com/codex/install.ps1 | iex
```
I got this error:
```
iex : The property 'OSArchitecture' cannot be found on this object. Verify that the property exists.
At line:1 char:45
+ irm https://chatgpt.com/codex/install.ps1 | iex
+ ~~~
+ CategoryInfo : NotSpecified: (:) [Invoke-Expression], PropertyNotFoundException
+ FullyQualifiedErrorId : PropertyNotFoundStrict,Microsoft.PowerShell.Commands.InvokeExpressionCommand
```
so we'll recommend the following that works from both `cmd.exe` and
PowerShell:
```
powershell -ExecutionPolicy ByPass -c "irm https://chatgpt.com/codex/install.ps1 | iex"
```
This PR makes a slight update to `codex-rs/tui/src/update_action.rs` to
match.
## Why
Windows sandbox diagnostics currently append to a single `sandbox.log`
under `CODEX_HOME/.sandbox`. That file never rolls over, which makes it
hard to safely include sandbox diagnostics in future feedback reports
without risking unbounded growth.
## What changed
- Replaced direct append-open sandbox logging with
`tracing_appender::rolling::RollingFileAppender`.
- Configured sandbox logs to rotate daily using names like
`sandbox.YYYY-MM-DD.log`.
- Added a conservative `MAX_LOG_FILES` cap of 90 retained matching log
files.
- Routed the Windows sandbox setup helper through the same rolling
writer.
- Added helpers for resolving the current daily sandbox log path so
future feedback upload work can use the same filename logic.
- Updated tests and test diagnostics to read the dated daily log file.
This intentionally does not include sandbox logs in `/feedback` yet;
scrubbing and attachment behavior can happen in a follow-up.
## Testing
- `cargo fmt -p codex-windows-sandbox`
- `cargo check -p codex-windows-sandbox`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-windows-sandbox logging::tests`
- `cargo clippy -p codex-windows-sandbox --all-targets -- -D warnings`
Add new enterprise requirement gate.
Validation:
- `cargo test -p codex-config --lib`
- `cargo test -p codex-app-server-protocol --lib`
- `cargo test -p codex-tui --lib debug_config`
- `cargo test -p codex-app-server --lib` *(fails: stack overflow in
`in_process::tests::in_process_start_initializes_and_handles_typed_v2_request`;
reproduces when run alone)*
## Why
Legacy `[profiles.<name>]` config tables and the legacy `profile`
selector are being retired in favor of profile files selected with
`--profile <name>`. After #23886 removed the CLI-side legacy profile
plumbing, the app-server config surface still exposed those fields and
still carried conversion code for the old protocol shape.
## What changed
- Remove `profile`, `profiles`, and `ProfileV2` from the app-server
config protocol/schema output so `config/read` no longer returns legacy
profile config.
- Drop the old v1 `UserSavedConfig` profile conversion path from
`config`.
- Reject new app-server config writes under `profiles.*` with the same
migration direction used for `profile`, while still allowing callers to
clear existing legacy profile tables.
- Refresh app-server config coverage and the experimental API README
example around the remaining `Config` nesting path.
## Verification
- Added config-manager coverage that `config/read` omits legacy profile
config, `profiles.*` writes are rejected, and existing legacy profile
tables can still be cleared.
- Updated the v2 config RPC test to cover the rejected `profiles.*`
batch-write path.
## Why
`codex sandbox` previously required an OS subcommand like `linux`,
`macos`, or `windows`, even though the command can only run the sandbox
backend available on the current host. That made the CLI imply a
cross-OS choice that does not exist.
## What changed
- Collapse `codex sandbox <os>` into `codex sandbox [COMMAND]...` by
wiring the `sandbox` parser directly to the host-specific backend args
with `cfg`.
- Keep the existing backend runners for Seatbelt, Linux sandbox, and
Windows restricted token.
- Rename the public Windows debug sandbox runner to
`run_command_under_windows_sandbox` for clarity.
- Update the Rust sandbox docs and related README references to describe
host OS selection and avoid pointing readers at legacy `sandbox_mode`
config.
## Arg0 compatibility
The `codex-linux-sandbox` helper path is still handled before normal CLI
parsing. `arg0_dispatch()` checks whether the executable basename is
`codex-linux-sandbox` and directly calls
`codex_linux_sandbox::run_main()`, so removing the `sandbox linux`
parser branch does not affect the arg0 helper flow.
## Verification
- `cargo test -p codex-cli`
- `cargo test -p codex-arg0`
- `just fix -p codex-cli`
## Why
The TUI currently creates a shared plaintext `codex-tui.log` under the
default log directory. That append-only file can keep growing across
runs even though the TUI already records diagnostics in bounded local
stores.
Make the plaintext file log an explicit troubleshooting choice instead
of a default side effect.
This is possible because logs are also stored in the DB with proper
rotation
## What changed
- Only install the TUI file logging layer when `log_dir` is explicitly
set.
- Remove the prior `codex-tui.log` at startup before an opt-in file
layer is created.
- Clarify the `log_dir` config/schema text and `docs/install.md` example
so users opt in with `codex -c log_dir=...` when they need a plaintext
log.
## Why
Remote compaction v2 sends a normal `/responses` request with a
compaction trigger. It should follow the retry semantics used by normal
Responses streaming calls for transient stream/request failures, while
keeping a smaller per-transport retry budget because compact attempts
can run much longer than normal turns.
## What changed
- Add a v2 compaction retry loop that uses `stream_max_retries`,
matching normal Responses turn retry mechanics.
- Cap the compact v2 retry budget at 2 retries per transport with
`min(stream_max_retries, 2)`.
- Retry retryable request-open and post-open stream collection failures
through the same loop.
- Use the existing 200ms exponential backoff and requested retry delay
handling used by normal turn retries.
- Emit the same `Reconnecting... n/max` stream-error notification
pattern.
- Fall back from WebSockets to HTTPS after the compact v2 stream retry
budget is exhausted, then reset the retry counter for HTTPS.
- Keep final remote-compaction failure logging after retries/fallback
are exhausted.
- Treat compact stream EOF before `response.completed` as a retryable
stream failure.
- Add compact v2 regression coverage with `request_max_retries = 0` and
`stream_max_retries = 2`, covering both request-open failure and
opened-stream EOF in one end-to-end test.
## Tests
- `just fmt`
- `cargo test -p codex-core remote_compact_v2`
- `just fix -p codex-core`
`cargo test` for the core and other crates fails on a fresh macOS
checkout without the right stack size variable. This change encourages
using the just test command that sets the environment up correctly.
As a bonus, this should encourage agents to get more benefit out of
nextest's parallel execution.
`#[serde(default)]` wasn't sufficient for our generated TS types to
reflect that clients didn't have to set them. We also need
`skip_serializing_if = "std::ops::Not::not"`. This is already a rule in
our agents.md file.
Updates our build script to pull down the artifacts like we do in CI for
building v8 into our targets.
This changes the flow so that we now pre-install rusty v8 assets for all
of our release targets from pre-built in workflow.
Secondarily if running it locally we now optionally pull the assets down
on python run assuming the user hasn't set the proper values, it then
provides them.
Sorry for the miss here.
## Why
`--profile` now selects `<name>.config.toml`, so the legacy `profile`
selector should not be reintroduced through config write or MCP tool
paths. A matching legacy selector in base user config also needs the
same migration guard as a matching legacy `[profiles.<name>]` table so
profile loading fails with one clear migration error instead of mixing
the old and new profile models.
## What
- reject non-null app-server config writes to the top-level legacy
`profile` selector
- make `--profile <name>` reject base user config that still selects the
same legacy `profile = "<name>"` value, alongside the existing matching
legacy profile-table guard
- reject removed MCP `codex` tool fields such as `profile` by denying
unknown tool-call parameters and exposing that restriction in the
generated schema
- add regression coverage for the app-server write paths, config loader
guard, and MCP tool input/schema behavior
## Verification
- targeted regression tests cover the new app-server, config loader, and
MCP rejection paths
## Summary
- drop the dead legacy profile usage metric and active-profile
conversation-start fields
- update role comments so they describe provider and service-tier
preservation without legacy config-profile wording
- pair the code cleanup with the file-backed profile docs update in
openai/developers-website#1476
## Testing
- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core` *(fails: existing stack overflow in
`mcp_tool_call::tests::guardian_mode_mcp_denial_returns_rationale_message`)*
- `cargo test -p codex-core --lib
mcp_tool_call::tests::guardian_mode_mcp_denial_returns_rationale_message`
*(fails with the same stack overflow)*
## Why
`/feedback` asks `ThreadManager` for the selected agent subtree before
it uploads logs. The previous live subtree path reconstructed
parent-child links by iterating every loaded thread and awaiting each
thread config snapshot, so unrelated loaded-thread state could stall
feedback subtree enumeration.
The loaded-thread set already belongs to
[`ThreadManagerState`](50e6644c94/codex-rs/core/src/thread_manager.rs).
Reading thread-spawn parents from the captured `CodexThread` session
sources at that boundary keeps unload and resume behavior manager-owned
while avoiding per-session config inspection.
## What Changed
- expose parent-child thread-spawn edges for loaded, non-internal
threads from `ThreadManagerState`
- build the live child map from those edges while keeping agent metadata
lookup and ordering in `AgentControl`
- add regression coverage for live subtree enumeration when no state DB
is available
## Validation
- `git diff --check`
- local Rust tests not run per request
## Why
[#23883](https://github.com/openai/codex/pull/23883) moved the
user-facing `--profile` flag onto profile v2 and
[#23886](https://github.com/openai/codex/pull/23886) removed CLI
forwarding for the legacy profile-v1 path. Core and TUI config
persistence still carried `active_profile` and
`ConfigEditsBuilder::with_profile`, which let later writes continue
targeting legacy `[profiles.<name>]` tables after profile selection
moved to profile-v2 config files.
## What
- Remove legacy profile routing from
[`ConfigEditsBuilder`](4b38e9c22e/codex-rs/core/src/config/edit.rs (L1064-L1294)),
so core config edits no longer carry `with_profile` or infer
`[profiles.*]` write targets from a `profile` key.
- Drop `active_profile` plumbing from runtime `Config`, TUI
startup/state, app-server config override forwarding, and Windows
sandbox setup persistence.
- Make app-server-backed TUI config edits use unscoped model,
service-tier, feature, Auto-review, plan-mode, and Windows sandbox paths
through
[`tui/src/config_update.rs`](4b38e9c22e/codex-rs/tui/src/config_update.rs (L43-L112)).
- Update config edit coverage so legacy `profile` state stays untouched
by direct model writes, and remove tests whose only contract was the
deleted profile-scoped persistence path.
## Testing
- Not run locally.
## Why
[#23883](https://github.com/openai/codex/pull/23883) moved user-facing
`--profile` selection onto profile v2, and
[#23886](https://github.com/openai/codex/pull/23886) removed the old CLI
`config_profile` override path. Core still had a second legacy path:
`profile = "..."` could select `[profiles.*]` values while runtime
config was built. Keeping that resolver alive preserves the old
precedence model and profile-carrying surfaces even though profile
selection now points at `$CODEX_HOME/<name>.config.toml`.
## What
- Reject legacy top-level `profile = "..."` config while loading runtime
config, with an error that points callers at `--profile <name>` and
`<name>.config.toml` in the [core load
path](3d923366ec/codex-rs/core/src/config/mod.rs (L2524-L2531)).
- Remove the remaining profile-v1 merge points from runtime config
resolution, including features, permissions, model/provider selection,
web search, Windows sandbox settings, TUI settings, role reloads, and
OSS provider lookup.
- Drop the leftover profile override surface from
[`ConfigOverrides`](3d923366ec/codex-rs/core/src/config/mod.rs (L2118-L2148))
and from the MCP server `codex` tool schema.
- Prune profile-precedence tests that only exercised the removed
resolver and replace them with rejection coverage for the legacy
selector.
## Testing
- Not run in this metadata pass.
- Added
[`legacy_profile_selection_is_rejected`](3d923366ec/codex-rs/core/src/config/config_tests.rs (L7942-L7965))
coverage for the new runtime guard.
## Why
`codex --profile <name> mcp ...` should reach the same profile-v2
migration guard as runtime commands. Otherwise legacy
`[profiles.<name>]` users see the generic command-scope rejection
instead of the existing guidance to move settings into
`$CODEX_HOME/<name>.config.toml`.
## What
- Allow `codex mcp` through the `--profile` subcommand gate.
- Pass profile loader overrides into the MCP entry point only to
validate profile-v2 migration when a profile is present.
- Keep MCP add/remove/list/get/login/logout behavior otherwise
unchanged; this does not add profile-scoped MCP server management.
- Cover the legacy profile migration error for `codex --profile work mcp
list`.
## Testing
- `cargo test -p codex-cli`
## Summary
- set `NODE_USE_ENV_PROXY=1` when Codex applies managed network proxy
environment overrides
- keep the Node opt-in in the proxy environment key set used by
shell/runtime env handling
- cover the new env var in the focused network proxy env test
## Why
Codex already sets HTTP proxy environment variables for child processes
when the managed network proxy is active. Node's built-in network
behavior needs the `NODE_USE_ENV_PROXY` opt-in to honor those env vars,
so Node-based skill scripts can otherwise skip the managed proxy path
and fail under restricted network access.
## Validation
- `just fmt` in `codex-rs`
- `cargo test -p codex-network-proxy` in `codex-rs`
## Summary
- Treat MCP tools with `readOnlyHint: true` as parallel-safe even when
`supports_parallel_tool_calls` is unset or `false`.
- Keep server-level `supports_parallel_tool_calls` as an additive
override for non-read-only tools.
- Add focused unit coverage for the MCP handler eligibility decision.
- Update RMCP integration coverage to keep the serial baseline on a
mutable tool, verify read-only concurrency without server opt-in, and
preserve the server opt-in concurrency path separately.
## Testing
- `just fmt`
- `cargo test -p codex-core --lib tools::handlers::mcp::tests::`
- `cargo test -p codex-core --test all
stdio_mcp_read_only_tool_calls_run_concurrently_without_server_opt_in`
- `cargo test -p codex-core --test all
stdio_mcp_parallel_tool_calls_opt_in_runs_concurrently`
- `cargo test -p codex-rmcp-client`
## Why
The `dev/cc/ref-def` branch preserves richer JSON Schema detail for
connector tools, including `$defs` and nested shapes. That improves
fidelity, but it pushes the largest connector schemas well past the
intended tool-schema budget. This PR adds a best-effort compaction pass
for unusually large tool input schemas so the p99 and max tails stay
small while ordinary schemas are left alone.
## What Changed
- Added best-effort large-schema compaction in
`codex-rs/tools/src/json_schema.rs` after schema sanitization and
definition pruning.
- Compaction runs as a waterfall only while the compact JSON budget
proxy is exceeded:
1. Strip schema `description` metadata.
2. Drop root `$defs` / `definitions`.
3. Collapse deep nested complex schema objects to `{}`.
- Kept top-level argument names and immediate schema shape where
possible.
## Corpus Results
Scope: 2,025 schemas under `golden_schemas`, all parsed successfully.
Token count is `o200k_base` over compact JSON from
`parse_tool_input_schema`.
| Percentile | Before `origin/main` `4dbca61e20` | After branch
`dev/cc/ref-def` `f9bf071758` | After this PR |
|---|---:|---:|---:|
| p0 | 9 | 9 | 9 |
| p10 | 59 | 63 | 63 |
| p25 | 81 | 86 | 86 |
| p50 | 114 | 127 | 125 |
| p75 | 174 | 205 | 202 |
| p90 | 295 | 335 | 322 |
| p95 | 391 | 526 | 422 |
| p99 | 794 | 1,303 | 689 |
| max | 2,836 | 3,337 | 887 |
After this PR, `0 / 2,025` schemas are over 1k tokens.
### Compaction Savings
These are cumulative waterfall stages over the same corpus. Later passes
only run for schemas that are still over the compact JSON budget proxy.
| Stage | Total tokens | Step savings | Schemas changed by step |
|---|---:|---:|---:|
| No compaction | 391,862 | - | - |
| Strip schema `description` metadata | 350,961 | 40,901 | 66 |
| Drop root `$defs` / `definitions` | 340,683 | 10,278 | 13 |
| Collapse deep complex schemas to `{}` | 335,875 | 4,808 | 6 |
## Why
Extension tools that need conversation context should be able to read it
from the live tool invocation instead of reaching into thread
persistence themselves.
## What changed
- Add a `ConversationHistory` snapshot to extension `ToolCall`s and
populate it from the current raw in-memory response history.
- Expose all history items at this boundary so each extension can filter
and bound the subset it needs before consuming or forwarding it.
- Cover the adapter and registry dispatch paths and update existing
extension tests that construct `ToolCall` literals.
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-extension-api`
- `cargo test -p codex-goal-extension`
- `cargo test -p codex-memories-extension`
- `cargo test -p codex-core passes_turn_fields_to_extension_call`
- `cargo test -p codex-core
extension_tool_executors_are_model_visible_and_dispatchable`
# Why
Some connector tool input schemas use local JSON Schema references and
definition tables to avoid duplicating large nested shapes. Codex
previously lowered these schemas into the supported subset in a way that
could discard `$ref`-only schema objects and lose the corresponding
definitions, which made non-strict tool registration less faithful than
the original connector schema.
This keeps the existing minimal-lowering policy: Codex still does not
raw-pass through arbitrary JSON Schema, but it now preserves local
reference structure that fits the Responses-compatible subset and prunes
definition entries that cannot be reached by following `$ref`s from the
root schema after sanitization, including refs found transitively inside
other reachable definitions. The pruning matters because Responses
parses definition tables even when entries are unused, so keeping dead
definitions wastes prompt tokens.
# What changed
- Added `$ref`, `$defs`, and legacy `definitions` fields to the tool
`JsonSchema` representation.
- Updated `parse_tool_input_schema` lowering so `$ref`-only schema
objects survive sanitization instead of becoming `{}`.
- Sanitized definition tables recursively and dropped malformed
definition tables so non-strict registration degrades gracefully.
- Added reachability pruning for root definition tables by starting from
refs outside definition tables, then following refs inside reachable
definitions.
- Added JSON Pointer decoding for local definition refs such as
`#/$defs/Foo~1Bar`.
# Verification
ran local golden-schema probes against representative connector schemas
to validate behavior on real generated schemas:
| Golden schema | Before bytes | After bytes | `$defs` before -> after |
`$ref` before -> after | Result |
|---|---:|---:|---:|---:|---|
| `google_calendar/create_space` | 7111 | 4526 | 7 -> 7 | 7 -> 7 | all
definitions preserved because all are reachable |
| `figma/apply_file_variable_changes` | 4609 | 999 | 8 -> 5 | 8 -> 5 |
unused defs pruned after unsupported `oneOf` shapes lower away |
| `snowflake/list_catalog_integrations` | 1380 | 404 | 3 -> 0 | 0 -> 0 |
all defs pruned because none are referenced |
| `dropbox/create_shared_link` | 8894 | 1836 | 14 -> 4 | 9 -> 4 | only
defs reachable from the root schema after sanitization are retained,
including transitively through other retained defs |
Token increase across golden schema due to this change:
<img width="817" height="366" alt="Screenshot 2026-05-19 at 1 47 04 PM"
src="https://github.com/user-attachments/assets/d5c80fe9-da85-41e6-8ac7-a01d1e0b0f71"
/>
## Summary
The auto-review runtime sync path was assigning a raw
`PermissionProfile` into `runtime_permission_profile_override`, whose
field now expects `RuntimePermissionProfileOverride`. That broke the TUI
Bazel build.
This changes the assignment to store
`RuntimePermissionProfileOverride::from_config(&self.config)`, matching
the other runtime override paths and preserving the active profile and
network metadata with the permission profile.
## Summary
- Add us-gov-west-1 to the Bedrock Mantle supported region list
- Cover the GovCloud endpoint URL in the existing base_url unit test
## Test
- cargo test -p codex-model-provider
## Why
Experimental feature toggles and memory settings can update several
related config values in one interaction. Keeping those writes local in
a remote TUI session is especially dangerous because the UI can diverge
from the app-server config while also leaving behind partially stale
supporting keys.
This is **[3 of 4]** in a stacked series that moves TUI-owned config
mutations onto app-server APIs.
## What changed
- Routed feature flag persistence through app-server batch writes,
including the supporting reviewer and permission updates used by
guardian approval.
- Routed Windows sandbox mode persistence and legacy Windows feature
cleanup through app-server writes.
- Routed memory settings through app-server batch writes and updated the
TUI tests to exercise the embedded app-server path.
## Config keys affected
- `features.<feature_key>`
- `profiles.<profile>.features.<feature_key>`
- `approval_policy`
- `sandbox_mode`
- `approvals_reviewer`
- `windows.sandbox`
- `features.experimental_windows_sandbox`
- `features.elevated_windows_sandbox`
- `features.enable_experimental_windows_sandbox`
- Profile-scoped Windows legacy feature variants under
`profiles.<profile>.features.*`
- `memories.use_memories`
- `memories.generate_memories`
- Profile-scoped memory variants under `profiles.<profile>.memories.*`
## Suggested manual validation
- Connect the TUI to a remote app server, toggle guardian approval on
and off, and confirm the remote config updates
`features.guardian_approval`, reviewer state, approval policy, and
sandbox mode coherently.
- Toggle a default-false experimental feature at the root level, disable
it again, and confirm the key clears instead of lingering as an
unnecessary explicit `false`.
- Change memory settings and confirm the remote config updates both
memory keys while the running TUI reflects the new state.
- On Windows, switch sandbox mode through the TUI and confirm
`windows.sandbox` is updated while the legacy Windows feature keys are
cleared.
## Stack
1. [#22913](https://github.com/openai/codex/pull/22913) `[1 of 4]`
primary settings writes
2. [#22914](https://github.com/openai/codex/pull/22914) `[2 of 4]` app
and skill enablement
3. [#22915](https://github.com/openai/codex/pull/22915) `[3 of 4]`
feature and memory toggles
4. [#22916](https://github.com/openai/codex/pull/22916) `[4 of 4]`
startup and onboarding bookkeeping
# What
When a normal hook fires inside a thread-spawned subagent, Codex now
includes these optional top-level fields in the hook input:
- `agent_id`: the child thread id
- `agent_type`: the subagent role
Root-agent hook inputs omit these fields. `SubagentStart` and
`SubagentStop` keep their existing required `agent_id` and `agent_type`
fields because those events are inherently subagent-scoped.
This does not change matcher behavior. Tool hooks still match on tool
name, compact hooks still match on trigger, and `UserPromptSubmit` still
ignores matchers. Only `SubagentStart` and `SubagentStop` match on
`agent_type`.
## Why
When remote control hits an auth failure such as a revoked or reused
refresh token, the websocket loop falls into reconnect backoff. If the
user fixes auth while that loop is sleeping, remote control can stay
offline until the old retry timer expires because nothing wakes the loop
or resets its exhausted auth recovery state.
## What Changed
Added an auth-change watch on `AuthManager` for refresh-relevant cached
auth updates.
The remote-control websocket loop now subscribes to that signal, resets
`UnauthorizedRecovery` and reconnect backoff when auth changes, and
retries immediately instead of waiting for the previous delay.
Updated the remote-control transport test to verify that reloading auth
with the now-available account id wakes enrollment before the prior
retry delay.
## Verification
`cargo test -p codex-app-server-transport
remote_control_waits_for_account_id_before_enrolling`
## Summary
- make rollout content search prefilter rollout files case-insensitively
- keep the no-ripgrep fallback scan and visible snippet matcher aligned
with that behavior
- cover a lowercase `thread/search` query matching mixed-case
conversation content
## Why
The rollout-backed `thread/search` path used exact string matching in
both its `rg` prefilter and semantic snippet generation. A content
result could be missed solely because the query casing did not match the
stored conversation text.
## Validation
- `just fmt`
- `cargo test -p codex-app-server thread_search_returns_content_matches`
- `cargo test -p codex-rollout`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `cargo build -p codex-cli`
- launched a local Electron dev instance with the rebuilt CLI binary
2026-05-21 14:14:01 -07:00
1189 changed files with 63518 additions and 21056 deletions
@@ -30,6 +30,7 @@ In the codex-rs folder where the rust code lives:
- Prefer private modules and explicitly exported public crate API.
- If you change `ConfigToml` or nested config types, run `just write-config-schema` to update `codex-rs/core/config.schema.json`.
- When working with MCP tool calls, prefer using `codex-rs/codex-mcp/src/mcp_connection_manager.rs` to handle mutation of tools and tool calls. Aim to minimize the footprint of changes and leverage existing abstractions rather than plumbing code through multiple levels of function calls.
- Do not call `reset_client_session` unnecessarily; let the incremental check logic decide whether to reuse the previous request.
- If you change Rust dependencies (`Cargo.toml` or `Cargo.lock`), run `just bazel-lock-update` from the
repo root to refresh `MODULE.bazel.lock`, and include that lockfile update in the same change.
- After dependency changes, run `just bazel-lock-check` from the repo root so lockfile drift is caught
@@ -52,12 +53,13 @@ In the codex-rs folder where the rust code lives:
the new implementation so the invariants stay close to the code that owns them.
- Avoid adding new standalone methods to `codex-rs/tui/src/chatwidget.rs` unless the change is
trivial; prefer new modules/files and keep `chatwidget.rs` focused on orchestration.
- When running Rust commands (e.g. `just fix` or `cargo test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
- When running Rust commands (e.g. `just fix` or `just test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
Run `just fmt` (in the `codex-rs` directory) automatically after you have finished making code changes anywhere in this repository; do not ask for approval to run it. Additionally, run the tests:
1.Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `cargo test -p codex-tui`.
2.Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `cargo test` (or `just test` if `cargo-nextest` is installed). Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
1.Do not run `cargo test` directly. Use `just test` so test execution follows the repo defaults.
2.Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `just test -p codex-tui`.
3. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `just test`. Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
Before finalizing a large change to `codex-rs`, run `just fix -p <project>` (in `codex-rs` directory) to fix any linter issues in the code. Prefer scoping with `-p` to avoid slow workspace‑wide Clippy builds; only run `just fix` without `-p` if you changed shared crates. Do not re-run tests after running `fix` or `fmt`.
@@ -108,6 +110,19 @@ See `codex-rs/tui/styles.md`.
## Tests
### Test module organization
- When adding a new test module, define its contents in a separate sibling file rather than inline in the implementation file.
- Use an explicit `#[path = "..._tests.rs"]` attribute so the test filename is descriptive and easy to locate:
```rust
#[cfg(test)]
#[path = "parser_tests.rs"]
mod tests;
```
- This applies only when introducing a new test module. Do not move or rewrite existing inline `#[cfg(test)] mod tests { ... }` modules solely to follow this convention.
### Snapshot tests
This repo uses snapshot tests (via `insta`), especially in `codex-rs/tui`, to validate rendered output.
@@ -120,7 +135,7 @@ is easy to review and future diffs stay visual.
When UI or text output changes intentionally, update the snapshots as follows:
- Run tests to generate any updated snapshots:
-`cargo test -p codex-tui`
- `just test -p codex-tui`
- Check what’s pending:
- `cargo insta pending-snapshots -p codex-tui`
- Review changes by reading the generated `*.snap.new` files directly in the repo, or preview a specific file:
@@ -214,6 +229,15 @@ These guidelines apply to app-server protocol work in `codex-rs`, especially:
- Regenerate schema fixtures when API shapes change:
`just write-app-server-schema`
(and `just write-app-server-schema --experimental` when experimental API fixtures are affected).
- Validate with `cargo test -p codex-app-server-protocol`.
- Validate with `just test -p codex-app-server-protocol`.
- Avoid boilerplate tests that only assert experimental field markers for individual
request fields in `common.rs`; rely on schema generation/tests and behavioral coverage instead.
## Python Development Best Practices
### Ignore Python 2 compatibility
This project uses Python 3+. You should not use the `__future__` module.
If you need to worry about feature compatibility between different 3.xx point releases, check the
closest `pyproject.toml`'s `requires-python` field to see what minimum runtime version is supported.
"description":"Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `auto_review` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request. The legacy value `guardian_subagent` is accepted for compatibility.",
"enum":[
@@ -423,7 +445,6 @@
]
},
"includeLayers":{
"default":false,
"type":"boolean"
}
},
@@ -721,8 +742,7 @@
}
},
"required":[
"classification",
"includeLogs"
"classification"
],
"type":"object"
},
@@ -1034,7 +1054,6 @@
"GetAccountParams":{
"properties":{
"refreshToken":{
"default":false,
"description":"When `true`, requests a proactive token refresh before returning.\n\nIn managed auth mode this triggers the normal refresh-token flow. In external auth mode this flag is ignored. Clients should refresh tokens themselves and call `account/login/start` with `chatgptAuthTokens`.",
"type":"boolean"
}
@@ -1066,6 +1085,8 @@
},
"ImageDetail":{
"enum":[
"auto",
"low",
"high",
"original"
],
@@ -1146,6 +1167,12 @@
"integer",
"null"
]
},
"threadId":{
"type":[
"string",
"null"
]
}
},
"type":"object"
@@ -2961,6 +2988,20 @@
],
"type":"object"
},
"SkillsExtraRootsSetParams":{
"properties":{
"extraRoots":{
"items":{
"$ref":"#/definitions/AbsolutePathBuf"
},
"type":"array"
}
},
"required":[
"extraRoots"
],
"type":"object"
},
"SkillsListParams":{
"properties":{
"cwds":{
@@ -3424,7 +3465,6 @@
"ThreadReadParams":{
"properties":{
"includeTurns":{
"default":false,
"description":"When true, include turns and their items from rollout history.",
"type":"boolean"
},
@@ -3517,6 +3557,42 @@
}
]
},
"ThreadResumeInitialTurnsPageParams":{
"properties":{
"itemsView":{
"anyOf":[
{
"$ref":"#/definitions/TurnItemsView"
},
{
"type":"null"
}
],
"description":"How much item detail to include for each returned turn; defaults to summary."
},
"limit":{
"description":"Optional turn page size.",
"format":"uint32",
"minimum":0.0,
"type":[
"integer",
"null"
]
},
"sortDirection":{
"anyOf":[
{
"$ref":"#/definitions/SortDirection"
},
{
"type":"null"
}
],
"description":"Optional turn pagination direction; defaults to descending."
}
},
"type":"object"
},
"ThreadResumeParams":{
"description":"There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nFor non-running threads, the precedence is: history > non-empty path > thread_id. If using history or a non-empty path for a non-running thread, the thread_id param will be ignored.\n\nIf thread_id identifies a running thread, app-server rejoins that thread and treats a non-empty path as a consistency check against the active rollout path. Empty string path values are treated as absent.\n\nPrefer using thread_id whenever possible.",
"properties":{
@@ -3923,6 +3999,12 @@
],
"description":"Override where approval requests are routed for review on this turn and subsequent turns."
},
"clientUserMessageId":{
"type":[
"string",
"null"
]
},
"cwd":{
"description":"Override the working directory for this turn and subsequent turns.",
"type":[
@@ -4009,6 +4091,12 @@
},
"TurnSteerParams":{
"properties":{
"clientUserMessageId":{
"type":[
"string",
"null"
]
},
"expectedTurnId":{
"description":"Required active turn id precondition. The request fails when it does not match the currently active turn.",
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"When `true`, requests a proactive token refresh before returning.\n\nIn managed auth mode this triggers the normal refresh-token flow. In external auth mode this flag is ignored. Clients should refresh tokens themselves and call `account/login/start` with `chatgptAuthTokens`.",
"type":"boolean"
}
@@ -10046,6 +10118,7 @@
"sessionFlags",
"plugin",
"cloudRequirements",
"cloudManagedConfig",
"legacyManagedConfigFile",
"legacyManagedConfigMdm",
"unknown"
@@ -10148,6 +10221,8 @@
},
"ImageDetail":{
"enum":[
"auto",
"low",
"high",
"original"
],
@@ -10353,6 +10428,12 @@
"integer",
"null"
]
},
"threadId":{
"type":[
"string",
"null"
]
}
},
"title":"ListMcpServerStatusParams",
@@ -10944,6 +11025,47 @@
"title":"McpResourceReadResponse",
"type":"object"
},
"McpServerInfo":{
"description":"Presentation metadata advertised by an initialized MCP server.",
"properties":{
"description":{
"type":[
"string",
"null"
]
},
"icons":{
"items":true,
"type":[
"array",
"null"
]
},
"name":{
"type":"string"
},
"title":{
"type":[
"string",
"null"
]
},
"version":{
"type":"string"
},
"websiteUrl":{
"type":[
"string",
"null"
]
}
},
"required":[
"name",
"version"
],
"type":"object"
},
"McpServerMigration":{
"properties":{
"name":{
@@ -11054,6 +11176,16 @@
},
"type":"array"
},
"serverInfo":{
"anyOf":[
{
"$ref":"#/definitions/v2/McpServerInfo"
},
{
"type":"null"
}
]
},
"tools":{
"additionalProperties":{
"$ref":"#/definitions/v2/Tool"
@@ -11774,7 +11906,7 @@
"NetworkUnixSocketPermission":{
"enum":[
"allow",
"none"
"deny"
],
"type":"string"
},
@@ -13173,107 +13305,6 @@
],
"type":"object"
},
"ProfileV2":{
"additionalProperties":true,
"properties":{
"approval_policy":{
"anyOf":[
{
"$ref":"#/definitions/v2/AskForApproval"
},
{
"type":"null"
}
]
},
"approvals_reviewer":{
"anyOf":[
{
"$ref":"#/definitions/v2/ApprovalsReviewer"
},
{
"type":"null"
}
],
"description":"[UNSTABLE] Optional profile-level override for where approval requests are routed for review. If omitted, the enclosing config default is used."
"description":"There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nFor non-running threads, the precedence is: history > non-empty path > thread_id. If using history or a non-empty path for a non-running thread, the thread_id param will be ignored.\n\nIf thread_id identifies a running thread, app-server rejoins that thread and treats a non-empty path as a consistency check against the active rollout path. Empty string path values are treated as absent.\n\nPrefer using thread_id whenever possible.",
@@ -18408,6 +18508,12 @@
],
"description":"Override where approval requests are routed for review on this turn and subsequent turns."
},
"clientUserMessageId":{
"type":[
"string",
"null"
]
},
"cwd":{
"description":"Override the working directory for this turn and subsequent turns.",
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"When `true`, requests a proactive token refresh before returning.\n\nIn managed auth mode this triggers the normal refresh-token flow. In external auth mode this flag is ignored. Clients should refresh tokens themselves and call `account/login/start` with `chatgptAuthTokens`.",
"type":"boolean"
}
@@ -6526,6 +6598,7 @@
"sessionFlags",
"plugin",
"cloudRequirements",
"cloudManagedConfig",
"legacyManagedConfigFile",
"legacyManagedConfigMdm",
"unknown"
@@ -6628,6 +6701,8 @@
},
"ImageDetail":{
"enum":[
"auto",
"low",
"high",
"original"
],
@@ -6882,6 +6957,12 @@
"integer",
"null"
]
},
"threadId":{
"type":[
"string",
"null"
]
}
},
"title":"ListMcpServerStatusParams",
@@ -7473,6 +7554,47 @@
"title":"McpResourceReadResponse",
"type":"object"
},
"McpServerInfo":{
"description":"Presentation metadata advertised by an initialized MCP server.",
"properties":{
"description":{
"type":[
"string",
"null"
]
},
"icons":{
"items":true,
"type":[
"array",
"null"
]
},
"name":{
"type":"string"
},
"title":{
"type":[
"string",
"null"
]
},
"version":{
"type":"string"
},
"websiteUrl":{
"type":[
"string",
"null"
]
}
},
"required":[
"name",
"version"
],
"type":"object"
},
"McpServerMigration":{
"properties":{
"name":{
@@ -7583,6 +7705,16 @@
},
"type":"array"
},
"serverInfo":{
"anyOf":[
{
"$ref":"#/definitions/McpServerInfo"
},
{
"type":"null"
}
]
},
"tools":{
"additionalProperties":{
"$ref":"#/definitions/Tool"
@@ -8303,7 +8435,7 @@
"NetworkUnixSocketPermission":{
"enum":[
"allow",
"none"
"deny"
],
"type":"string"
},
@@ -9702,107 +9834,6 @@
],
"type":"object"
},
"ProfileV2":{
"additionalProperties":true,
"properties":{
"approval_policy":{
"anyOf":[
{
"$ref":"#/definitions/AskForApproval"
},
{
"type":"null"
}
]
},
"approvals_reviewer":{
"anyOf":[
{
"$ref":"#/definitions/ApprovalsReviewer"
},
{
"type":"null"
}
],
"description":"[UNSTABLE] Optional profile-level override for where approval requests are routed for review. If omitted, the enclosing config default is used."
"description":"There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nFor non-running threads, the precedence is: history > non-empty path > thread_id. If using history or a non-empty path for a non-running thread, the thread_id param will be ignored.\n\nIf thread_id identifies a running thread, app-server rejoins that thread and treats a non-empty path as a consistency check against the active rollout path. Empty string path values are treated as absent.\n\nPrefer using thread_id whenever possible.",
@@ -16232,6 +16332,12 @@
],
"description":"Override where approval requests are routed for review on this turn and subsequent turns."
},
"clientUserMessageId":{
"type":[
"string",
"null"
]
},
"cwd":{
"description":"Override the working directory for this turn and subsequent turns.",
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"properties":{
@@ -642,107 +656,6 @@
],
"type":"string"
},
"ProfileV2":{
"additionalProperties":true,
"properties":{
"approval_policy":{
"anyOf":[
{
"$ref":"#/definitions/AskForApproval"
},
{
"type":"null"
}
]
},
"approvals_reviewer":{
"anyOf":[
{
"$ref":"#/definitions/ApprovalsReviewer"
},
{
"type":"null"
}
],
"description":"[UNSTABLE] Optional profile-level override for where approval requests are routed for review. If omitted, the enclosing config default is used."
"description":"Enterprise-managed config layer delivered by the cloud config bundle.",
"properties":{
"id":{
"description":"Stable identifier for the delivered layer.",
"type":"string"
},
"name":{
"description":"Admin-facing name for the delivered layer. This is surfaced in diagnostics so users know which cloud layer needs administrator attention.",
"type":"string"
},
"type":{
"enum":[
"enterpriseManaged"
],
"title":"EnterpriseManagedConfigLayerSourceType",
"type":"string"
}
},
"required":[
"id",
"name",
"type"
],
"title":"EnterpriseManagedConfigLayerSource",
"type":"object"
},
{
"description":"User config layer from $CODEX_HOME/config.toml. This layer is special in that it is expected to be: - writable by the user - generally outside the workspace directory",
"description":"When `true`, requests a proactive token refresh before returning.\n\nIn managed auth mode this triggers the normal refresh-token flow. In external auth mode this flag is ignored. Clients should refresh tokens themselves and call `account/login/start` with `chatgptAuthTokens`.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"description":"How much item detail to include for each returned turn; defaults to summary."
},
"limit":{
"description":"Optional turn page size.",
"format":"uint32",
"minimum":0.0,
"type":[
"integer",
"null"
]
},
"sortDirection":{
"anyOf":[
{
"$ref":"#/definitions/SortDirection"
},
{
"type":"null"
}
],
"description":"Optional turn pagination direction; defaults to descending."
}
},
"type":"object"
},
"TurnItemsView":{
"oneOf":[
{
"description":"`items` was not loaded for this turn. The field is intentionally empty.",
"enum":[
"notLoaded"
],
"type":"string"
},
{
"description":"`items` contains only a display summary for this turn.",
"enum":[
"summary"
],
"type":"string"
},
{
"description":"`items` contains every ThreadItem available from persisted app-server history for this turn.",
"enum":[
"full"
],
"type":"string"
}
]
}
},
"description":"There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nFor non-running threads, the precedence is: history > non-empty path > thread_id. If using history or a non-empty path for a non-running thread, the thread_id param will be ignored.\n\nIf thread_id identifies a running thread, app-server rejoins that thread and treats a non-empty path as a consistency check against the active rollout path. Empty string path values are treated as absent.\n\nPrefer using thread_id whenever possible.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"AdditionalContextEntry":{
"properties":{
"kind":{
"$ref":"#/definitions/AdditionalContextKind"
},
"value":{
"type":"string"
}
},
"required":[
"kind",
"value"
],
"type":"object"
},
"AdditionalContextKind":{
"enum":[
"untrusted",
"application"
],
"type":"string"
},
"ApprovalsReviewer":{
"description":"Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `auto_review` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request. The legacy value `guardian_subagent` is accepted for compatibility.",
"enum":[
@@ -101,6 +123,8 @@
},
"ImageDetail":{
"enum":[
"auto",
"low",
"high",
"original"
],
@@ -492,6 +516,12 @@
],
"description":"Override where approval requests are routed for review on this turn and subsequent turns."
},
"clientUserMessageId":{
"type":[
"string",
"null"
]
},
"cwd":{
"description":"Override the working directory for this turn and subsequent turns.",
// This file was generated by [ts-rs](https://github.com/Aleph-Alpha/ts-rs). Do not edit this file manually.
importtype{SortDirection}from"./SortDirection";
importtype{TurnItemsView}from"./TurnItemsView";
exporttypeThreadResumeInitialTurnsPageParams={
/**
* Optional turn page size.
*/
limit?: number|null,
/**
* Optional turn pagination direction; defaults to descending.
*/
sortDirection?: SortDirection|null,
/**
* How much item detail to include for each returned turn; defaults to summary.
*/
itemsView?: TurnItemsView|null,};
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.