## Why
The TUI still had a few low-risk dependencies flowing through the
transitional `legacy_core` namespace after the app-server migration.
These helpers either already have clearer non-core owners or are
presentation logic that does not belong in `codex-core`, so moving them
out reduces the compatibility surface without changing product behavior.
## What changed
This is a low-risk change, almost completely mechanical in nature.
- Route TUI Codex-home lookup through `codex-utils-home-dir`, use
`Config::log_dir` directly, and call
`codex-sandboxing::system_bwrap_warning` without going through
`legacy_core`.
- Move shared `codex resume` hint formatting from `codex-core` into
`codex-utils-cli`.
- Update CLI and TUI call sites to use the shared CLI utility, and keep
the resume-command behavior covered by tests in its new home.
## Verification
- `cargo test -p codex-utils-cli`
- `cargo test -p codex-utils-cli resume_command`
## Summary
Removes the feature since this is effectively on by default in all cases
where we should use it, or can be configured via models.json.
## Testing
- [x] unit tests pass
## Summary
- remove two redundant `PathBuf` clones in Windows sandbox setup tests
- fix current `rust-ci-full` Windows clippy failures on `main`
## Validation
- `just fmt`
- attempted on `dev`: `cargo clippy --target x86_64-pc-windows-msvc
--tests --profile dev --timings -- -D warnings`
- blocked by missing MSVC cross toolchain on the Linux devbox (`lib.exe`
/ MSVC C toolchain unavailable)
- live failure evidence: main `rust-ci-full` runs 25880209898 and
25879137967 failed on `windows-sandbox-rs/src/bin/setup_main/win.rs`
with `clippy::redundant_clone` at the two edited callsites
## Summary
- remove the app-server `plugin-read` serialization queue from
`plugin/list` and `plugin/read`
- allow plugin read/list requests to start immediately instead of
waiting behind other plugin read/list requests
## Test plan
- `just fmt`
- `cargo test -p codex-app-server-protocol`
made a `rust-release-prepare` environment with the necessary API key as
an environment secret. use this in the workflow rather than the action
secret.
once this merges and i confirm it works as intended, ill rm the action
secret.
## Summary
This change lets `forced_chatgpt_workspace_id` accept multiple workspace
IDs instead of a single value.
It keeps the existing config key name, adds backward-compatible parsing
for a single string in `config.toml`, and normalizes the setting into an
allowed workspace list across login enforcement, app-server config
surfaces, and local ChatGPT auth helpers.
## Why
Workspace-restricted deployments may need to allow more than one ChatGPT
workspace without dropping the guardrail entirely.
## Server-side impact
Codex's local server and app-server protocol needed changes because they
previously assumed a single workspace ID. The local login flow now
matches the auth backend interface by sending the allowed workspace list
as a single comma-separated `allowed_workspace_id` query parameter.
## Validation
This was tested with:
- A single workspace config
- With multi-workspace configs
- With multiple workspaces in the config
- The user only being a part of a subset of them
All were successful.
Automated coverage:
- `cargo test -p codex-login`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-tui local_chatgpt_auth`
- `cargo test --locked -p codex-app-server
login_account_chatgpt_includes_forced_workspace_allowlist_query_param`
## Why
Some core integration-test paths were creating Codex state under ambient
`~/.codex`. In environments where `HOME=/tmp`, that showed up as
`/tmp/.codex`, which is host-level shared state and makes these tests
environment/order sensitive.
The affected paths were:
- `core/tests/suite/live_cli.rs`: `run_live()` spawned the real CLI with
a temp cwd, but without an isolated home, so the child resolved Codex
home from ambient `HOME`.
- core / exec-server integration test binaries using
`configure_test_binary_dispatch(...)`: their startup ctor installs arg0
helper aliases like `apply_patch` and `codex-linux-sandbox`. Full
`arg0_dispatch()` also installs aliases from ambient Codex-home
resolution, so test-binary startup could create `CODEX_HOME/tmp/arg0`;
with `HOME=/tmp`, that became `/tmp/.codex/tmp/arg0/...`.
## What changed
- `live_cli` now gives the spawned CLI a temp `HOME` and temp
`CODEX_HOME`.
- arg0 alias setup now has an explicit-home form,
`prepend_path_entry_for_codex_aliases_in(...)`, so test helpers can
place alias state under a temp directory without relying on ambient
`CODEX_HOME`.
- helper re-entry behavior is preserved with
`dispatch_arg0_if_needed()`, so aliases like `apply_patch` and
`codex-linux-sandbox` still dispatch correctly before test alias
installation.
- core test support keeps the temp Codex home alive for the lifetime of
the test binary, matching the alias lifetime.
## Verification
Verified on `dev2` with `HOME=/tmp` that the focused core test-binary
startup path no longer recreates `/tmp/.codex`.
Also checked the exact `live_cli` test path under `HOME=/tmp`; on `dev2`
it still hits the existing remote-only `cargo_bin("codex-rs")`
resolution failure before spawning the child, but `/tmp/.codex` remains
absent after the run.
## Why
The Docker remote-env coverage was failing before it reached the
behavior those tests are meant to exercise. The remote-aware test
fixture only registered the remote environment, so tests that
intentionally select both `local` and `remote` could not start a turn.
After that was fixed, two tests exposed stale fixtures: the approval
test was auto-approving under workspace-write, and the remote
`view_image` test was writing invalid PNG bytes.
## What Changed
- Added `EnvironmentManager::create_for_tests_with_local(...)` so tests
can keep the provider default while also selecting `local` explicitly.
- Updated `build_remote_aware()` to use that test-only manager when a
remote exec-server URL is present.
- Changed the remote apply-patch approval helper to use
`SandboxPolicy::new_read_only_policy()` so the test actually exercises
approval caching per environment.
- Replaced the hardcoded remote `view_image` PNG blob with the existing
`png_bytes(...)` helper so the test uses a valid image fixture.
## Validation
Ran these isolated Docker remote-env tests on the devbox with
`$remote-tests` setup:
-
`suite::remote_env::apply_patch_freeform_routes_to_selected_remote_environment`
-
`suite::remote_env::apply_patch_approvals_are_remembered_per_environment`
-
`suite::remote_env::apply_patch_intercepted_exec_command_routes_to_selected_remote_environment`
-
`suite::remote_env::exec_command_routes_to_selected_remote_environment`
- `suite::view_image::view_image_routes_to_selected_remote_environment`
All five pass.
## Why
`thread_start_params_include_review_policy_when_review_policy_is_manual_only`
builds a `Config` with a temporary `CODEX_HOME`, but
`ConfigBuilder::default()` can still load host-managed configuration. On
local macOS machines with enterprise-managed Codex config, that host
state can leak into the test and change the resulting config, even
though CI does not have the same managed config source.
This makes the test environment-dependent: it can pass in CI while
failing locally for developers who have managed configuration installed.
## What Changed
- Updated `codex-rs/exec/src/lib_tests.rs` so the test calls
`LoaderOverrides::without_managed_config_for_tests()` through
`ConfigBuilder::loader_overrides(...)`.
- Left the rest of the test setup intact, including the temporary
`CODEX_HOME`, temporary cwd, and explicit `approvals_reviewer` harness
override.
## Verification
```shell
cargo test -p codex-exec thread_start_params_include_review_policy_when_review_policy_is_manual_only
```
## Why
Some MCP OAuth providers require a pre-registered public client ID and
cannot rely on dynamic client registration. Codex already supports MCP
OAuth, but it had no way to supply that client ID from config into the
PKCE flow.
## What changed
- add `oauth.client_id` under `[mcp_servers.<server>]` config, including
config editing and schema generation
- thread the configured client ID through CLI, app-server, plugin login,
and MCP skill dependency OAuth entrypoints
- configure RMCP authorization with the explicit client when present,
while preserving the existing dynamic-registration path when it is
absent
- add focused coverage for config parsing/serialization and OAuth URL
generation
## Verification
- `cargo test -p codex-config -p codex-rmcp-client -p codex-mcp -p
codex-core-plugins`
- `cargo test -p codex-core blocking_replace_mcp_servers_round_trips
--lib`
- `cargo test -p codex-core
replace_mcp_servers_streamable_http_serializes_oauth_resource --lib`
- `cargo test -p codex-core config_schema_matches_fixture --lib`
## Notes
Broader local package runs still hit unrelated pre-existing stack
overflows in:
- `codex-app-server::in_process_start_clamps_zero_channel_capacity`
-
`codex-core::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
## Why
PR #21396 merged after #17141 removed the old
`ConfigLayerStack::get_user_layer()` API. The new plugin CLI call sites
still used that stale API, which caused `main` to fail compilation.
## What Changed
- update `codex plugin marketplace list` to read configured marketplaces
through `get_active_user_layer()`
- update the plugin snapshot validation helper to use
`get_active_user_layer()`
This preserves the intended active writable user-layer behavior from the
profile-aware config API while fixing the stale call sites.
## Validation
- `cargo check -p codex-cli`
- `cargo test -p codex-cli --test plugin_cli`
- `git diff --check`
## Summary
- For SIWC users, update the model list merging logic to prefer the
model list fetched from the backend over the bundled model list (this is
needed for special cases where users have a more limited set of models
they're allowed to use)
- Add or update tests covering the revised cache behavior
## Testing
- Added/updated unit tests in
`codex-rs/models-manager/src/manager_tests.rs`
- Not run (not requested)
## Why
Network approval prompts are rendered without a command string on the
app-server path. After the user approves one of those prompts, the TUI
history cell previously fell back to command-oriented copy and produced
malformed lines such as:
```text
You approved codex to run every time this session
```
That hid the network target the user actually approved and left a
visibly broken transcript entry.
## What changed
- Preserve the approval subject as either a command or a network target
when recording TUI approval decisions.
- Render target-aware history copy for network approval outcomes:
- approve once
- approve for the current session
- cancel
- Include the approval protocol and preserve the managed-proxy
`network-access` target when present, including non-default ports such
as `https://example.com:8443`.
- Fall back to formatting the network approval context as
`protocol://host` when no generated target command is available.
- Keep ordinary command approval history, Guardian approval history, and
persisted network-rule history behavior unchanged.
- Add focused regression coverage and snapshots for the three
network-history cases.
## How to Test
1. Start Codex in a flow that triggers a network approval prompt.
2. Approve network access only for the current conversation.
3. Confirm the transcript records the approved network target, for
example:
- `You approved codex network access to https://example.com:8443 every
time this session`
4. Trigger the prompt again and verify the one-time approval and cancel
paths also record target-specific history text instead of an empty
command gap.
Targeted automated coverage:
- `cargo test -p codex-tui network_exec_approval_history`
## Additional verification
- `cargo insta pending-snapshots`
- `git diff --check`
- `just fix -p codex-tui`
- `just argument-comment-lint`
## Known unrelated local test noise
A full `cargo test -p codex-tui` run still hits a pre-existing stack
overflow outside this change:
- `tests::fork_last_filters_latest_session_by_cwd_unless_show_all`
aborts with a stack overflow
## Summary
- keep Git metadata/status subprocesses independent of repository
`core.fsmonitor` configuration
- preserve existing working-tree state reporting while making the helper
behavior more predictable
- add regression coverage for `get_has_changes` when a repository
defines an fsmonitor command
## Validation
- `cargo fmt --all`
- `cargo test -p codex-core test_get_has_changes_`
- `cargo test -p codex-git-utils`
## Why
Some sandboxed integration tests enabled both ambient temp roots
(`TMPDIR` and literal `/tmp`) even though they were not testing
temp-root behavior. On Linux bwrap, making `/tmp` writable causes
protected metadata mount targets such as `/tmp/.git`, `/tmp/.agents`,
and `/tmp/.codex` to be synthesized. If a run is interrupted, those
top-level markers can be left behind and contaminate later tests.
## What changed
For the incidental integration tests that do not need ambient temp-root
access, set `exclude_tmpdir_env_var` and `exclude_slash_tmp` to `true`.
Dedicated protected-metadata coverage remains in the lower-level sandbox
tests that use isolated temp roots.
## Verification
Focused remote devbox repros passed with a watcher polling `/tmp/.git`,
`/tmp/.agents`, and `/tmp/.codex`; no leaked markers were observed.
## Why
Plugin CLI installs should behave more like `apt-get install`:
configured marketplaces are the only install sources, the local
marketplace snapshot is the package index used at install time, and
`plugins/cache` is only a cache of already-downloaded plugin bytes.
That distinction matters once marketplaces and plugins have auth or
availability state. A repo-local marketplace manifest or leftover cached
plugin artifact should not silently become an install source unless the
marketplace was explicitly configured and its readable snapshot still
authorizes the plugin.
## What Changed
- add CLI commands to list configured marketplaces and add, list, or
remove marketplace plugins
- accept stable `plugin@marketplace` ids for add/remove while preserving
the explicit `--marketplace` form
- restrict `codex plugin add` and `codex plugin list` to configured
marketplaces instead of also discovering current-working-directory
marketplace roots
- fail `codex plugin add` and `codex plugin list` when a configured
marketplace snapshot is missing or malformed instead of treating it as
an empty source or a generic plugin miss
- preserve marketplace snapshot semantics: a configured local/Git
marketplace snapshot can authorize installs without consulting the
original upstream source
- allow `plugins/cache` reuse only after configured marketplace
resolution succeeds
- keep removal resilient after marketplace deletion or drift and ignore
malformed marketplace config entries in listing
## Commands Added
- `codex plugin add <plugin>@<marketplace>`
- `codex plugin add <plugin> --marketplace <marketplace>`
- `codex plugin list`
- `codex plugin list --marketplace <marketplace>`
- `codex plugin remove <plugin>@<marketplace>`
- `codex plugin remove <plugin> --marketplace <marketplace>`
- `codex plugin marketplace add <source>`
- `codex plugin marketplace add <source> --ref <ref>`
- `codex plugin marketplace add <source> --sparse <path>`
- `codex plugin marketplace list`
- `codex plugin marketplace upgrade`
- `codex plugin marketplace upgrade <marketplace>`
- `codex plugin marketplace remove <marketplace>`
## CLI Help Output
<details>
<summary><code>codex plugin --help</code></summary>
```text
Manage Codex plugins
Usage: codex plugin [OPTIONS] <COMMAND>
Commands:
add Install a plugin from a configured marketplace snapshot
list List plugins available from configured marketplace snapshots
marketplace Add, list, upgrade, or remove configured plugin marketplaces
remove Remove an installed plugin from local config and cache
help Print this message or the help of the given subcommand(s)
```
</details>
<details>
<summary><code>codex plugin add --help</code></summary>
```text
Install a plugin from a configured marketplace snapshot.
Pass either `PLUGIN@MARKETPLACE` or pass `PLUGIN` with `--marketplace MARKETPLACE`.
Usage: codex plugin add [OPTIONS] <PLUGIN[@MARKETPLACE]>
Arguments:
<PLUGIN[@MARKETPLACE]>
Plugin selector to install: either PLUGIN@MARKETPLACE or PLUGIN with --marketplace
Options:
-m, --marketplace <MARKETPLACE>
Configured marketplace name to use when PLUGIN does not include @MARKETPLACE
Examples:
codex plugin add sample@debug
codex plugin add sample --marketplace debug
```
</details>
<details>
<summary><code>codex plugin list --help</code></summary>
```text
List plugins available from configured marketplace snapshots
Usage: codex plugin list [OPTIONS]
Options:
-m, --marketplace <MARKETPLACE>
Only list plugins from this configured marketplace name
Examples:
codex plugin list
codex plugin list --marketplace debug
```
</details>
<details>
<summary><code>codex plugin remove --help</code></summary>
```text
Remove an installed plugin from local config and cache.
Pass either `PLUGIN@MARKETPLACE` or pass `PLUGIN` with `--marketplace MARKETPLACE`.
Usage: codex plugin remove [OPTIONS] <PLUGIN[@MARKETPLACE]>
Arguments:
<PLUGIN[@MARKETPLACE]>
Plugin selector to remove: either PLUGIN@MARKETPLACE or PLUGIN with --marketplace
Options:
-m, --marketplace <MARKETPLACE>
Marketplace name to use when PLUGIN does not include @MARKETPLACE
Examples:
codex plugin remove sample@debug
codex plugin remove sample --marketplace debug
```
</details>
<details>
<summary><code>codex plugin marketplace --help</code></summary>
```text
Add, list, upgrade, or remove configured plugin marketplaces
Usage: codex plugin marketplace [OPTIONS] <COMMAND>
Commands:
add Add a local or Git marketplace to the configured marketplace sources
list List configured marketplace names and their local snapshot roots
upgrade Refresh configured Git marketplace snapshots
remove Remove a configured marketplace source by name
```
</details>
<details>
<summary><code>codex plugin marketplace add --help</code></summary>
```text
Add a local or Git marketplace to the configured marketplace sources
Usage: codex plugin marketplace add [OPTIONS] <SOURCE>
Arguments:
<SOURCE>
Marketplace source: a local path, owner/repo[@ref], HTTPS Git URL, or SSH Git URL
Options:
--ref <REF>
Git ref to fetch for Git marketplace sources
--sparse <PATH>
Sparse checkout path for Git marketplace sources. Can be repeated
Examples:
codex plugin marketplace add ./path/to/marketplace
codex plugin marketplace add owner/repo --ref main
codex plugin marketplace add https://github.com/owner/repo --sparse plugins/foo
```
</details>
<details>
<summary><code>codex plugin marketplace list --help</code></summary>
```text
List configured marketplace names and their local snapshot roots
Usage: codex plugin marketplace list [OPTIONS]
```
</details>
<details>
<summary><code>codex plugin marketplace upgrade --help</code></summary>
```text
Refresh configured Git marketplace snapshots.
Omit MARKETPLACE_NAME to upgrade all configured Git marketplaces.
Usage: codex plugin marketplace upgrade [OPTIONS] [MARKETPLACE_NAME]
Arguments:
[MARKETPLACE_NAME]
Optional configured marketplace name to upgrade. Omit to upgrade all Git marketplaces
Examples:
codex plugin marketplace upgrade
codex plugin marketplace upgrade debug
```
</details>
<details>
<summary><code>codex plugin marketplace remove --help</code></summary>
```text
Remove a configured marketplace source by name
Usage: codex plugin marketplace remove [OPTIONS] <MARKETPLACE_NAME>
Arguments:
<MARKETPLACE_NAME>
Configured marketplace name to remove
Example:
codex plugin marketplace remove debug
```
</details>
## Public Semantics
- `codex plugin add <plugin>@<marketplace>` succeeds only when
`<marketplace>` is configured and its local marketplace snapshot
contains `<plugin>`
- repo-local marketplaces are not install sources until the user runs
`codex plugin marketplace add ...`
- configured marketplace snapshots must be readable; missing or
malformed snapshots fail the CLI operation rather than silently falling
through to cache or empty results
- cached plugin artifacts can satisfy reinstall only when the configured
marketplace snapshot still authorizes that plugin
- cached plugin artifacts alone never make a plugin installable
## Tests
- `cargo test -p codex-cli --test plugin_cli`
- `cargo clippy -p codex-cli --tests -- -D warnings`
- `cargo test -p codex-cli`
- `git diff --check`
- `just bazel-lock-update`
- `just bazel-lock-check`
## Why
`ChatComposer` currently owns text editing alongside attachment
bookkeeping and popup lifecycle state, while `BottomPane` still triggers
a couple of popup resyncs after composer methods that already do that
work internally. That blurs the ownership boundary and makes the
composer harder to simplify safely.
This PR is part 1 of a two-part cleanup. It peels off the composer state
that can move cleanly on its own, so the follow-up can tackle the
heavier draft/editing boundary without mixing every concern into one
diff.
## What changed
- Move local and remote image bookkeeping, placeholder relabeling, and
remote-image keyboard selection into `AttachmentState`.
- Move active-popup and popup-dismissal/query bookkeeping into
`PopupState`.
- Update composer and history-search paths to use those state owners
directly.
- Remove redundant `BottomPane` popup synchronization after paste
handling and `insert_str`.
## Part 2
The follow-up PR will finish the cleanup around the remaining composer
boundary: split out the draft/editing-oriented state and footer/status
presentation concerns that still live in `ChatComposer`, then revisit
the leftover `BottomPane` pass-throughs once those ownership lines are
explicit. The goal is for `ChatComposer` to coordinate a few focused
collaborators instead of continuing to be the landing zone for every
input-path concern.
## Verification
Did manual smoke tests.
This is the exact same change as @bolinfest made but he could not push
because of github action change permission.
## Why
The `rust-release` workflow can now be run manually with
`sign_macos=false` to skip macOS signing, but that path previously
stopped before creating a GitHub Release. That left the unsigned macOS
binaries available only as workflow-run artifacts, which are awkward to
fetch from automation and cannot be retrieved with a simple
unauthenticated `curl`.
For the unsigned path we still should not perform the normal release
side effects: no npm or Python publishing, no WinGet publishing, no
`latest-alpha-cli` branch update, and no promotion to GitHub's latest
release. The goal is only to make the build outputs easy to fetch from
the release page.
## What changed
- Allow the `release` job in `.github/workflows/rust-release.yml` to run
for `workflow_dispatch` runs with `sign_macos=false`.
- For unsigned runs, keep the unsigned macOS artifacts plus the normal
Linux and Windows release artifacts needed for DotSlash, then
create/update the GitHub Release with `make_latest: false`.
- Keep the normal publish/promote paths gated to signed releases:
- npm staging and publish
- Python runtime publish
- WinGet publish
- `latest-alpha-cli` update
- developer-site deploy
- normal DotSlash release files
- Add `.github/dotslash-unsigned-config.json`, which publishes
`*-unsigned` DotSlash files that use unsigned macOS artifacts and the
normal Linux/Windows artifacts.
## What I added
PLEASE READ THIS!!!
I added `codex-command-runner` and `codex-windows-sandbox-setup` entries
to `.github/dotslash-unsigned-config.json` so that with
`sign_macos=false` we would still get the dotslash files for those
artifacts which are necessary for windows builds.
## Why
This is a small precursor to the larger permissions-migration work. Both
the comparison stack in
[#22401](https://github.com/openai/codex/pull/22401) /
[#22402](https://github.com/openai/codex/pull/22402) and the alternate
stack in [#22610](https://github.com/openai/codex/pull/22610) /
[#22611](https://github.com/openai/codex/pull/22611) /
[#22612](https://github.com/openai/codex/pull/22612) are easier to
review if the terminology is already settled underneath them.
Because `:project_roots` and `:danger-no-sandbox` have not shipped as
stable user-facing surface area, carrying them forward as aliases would
just add more migration logic to the later stacks. This PR removes that
ambiguity now so the follow-on work can rely on one spelling for each
built-in concept.
## What Changed
- renamed the config-facing special filesystem key from `:project_roots`
to `:workspace_roots`
- dropped unpublished `:project_roots` parsing support in
`core/src/config/permissions.rs`, so new config only recognizes
`:workspace_roots`
- renamed the built-in full-access permission profile id from
`:danger-no-sandbox` to `:danger-full-access`
- dropped unpublished `:danger-no-sandbox` support entirely, including
the old active-profile canonicalization path, and added explicit
rejection coverage for the legacy id
- introduced shared built-in permission-profile id constants in
`codex-rs/protocol/src/models.rs`
- updated `core`, `app-server`, and `tui` call sites that special-case
built-in profiles to use the shared constants and canonical ids
- updated tests and the Linux sandbox README to use `:workspace_roots` /
`:danger-full-access`
## Verification
I focused verification on the three places this rename can regress:
config parsing, active-profile identity surfaced back out of `core`, and
user/server call sites that special-case built-in profiles.
Targeted checks:
-
`config::tests::default_permissions_can_select_builtin_profile_without_permissions_table`
-
`config::tests::default_permissions_read_only_applies_additional_writable_roots_as_modifications`
-
`config::tests::default_permissions_can_select_builtin_full_access_profile`
- `config::tests::legacy_danger_no_sandbox_is_rejected`
- `workspace_root` filtered `codex-core` tests
-
`request_processors::thread_processor::thread_processor_tests::thread_processor_behavior_tests::requested_permissions_trust_project_uses_permission_profile_intent`
-
`suite::v2::turn_start::turn_start_rejects_invalid_permission_selection_before_starting_turn`
- `status::tests::status_snapshot_shows_auto_review_permissions`
-
`status::tests::status_permissions_full_disk_managed_with_network_is_danger_full_access`
-
`app_server_session::tests::embedded_turn_permissions_use_active_profile_selection`
## Summary
- carry the per-turn extension data through RunningTask so abort
handling can rebuild SessionTaskContext
- update stale test ExtensionData::new() callsites to pass the turn id
## Testing
- Not run after PR branch creation; CI will cover.
## Summary
- Treat PowerShell stop-parsing token forms as unsupported in the
AST-backed command flattener.
- Add focused regressions at the parser layer and Windows command-safety
layer.
## Why
The command-safety parser lowers PowerShell AST elements into argv-like
words. Stop-parsing syntax preserves a native-command argument shape
that this lowering does not model, so these forms should stay on the
conservative unsupported path.
## Validation
- `cargo fmt --manifest-path codex-rs/Cargo.toml --all --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p
codex-shell-command`
## Why
`--profile-v2 <name>` gives launchers and runtime entry points a named
profile config without making each profile duplicate the base user
config. The base `$CODEX_HOME/config.toml` still loads first, then
`$CODEX_HOME/<name>.config.toml` layers above it and becomes the active
writable user config for that session.
That keeps shared defaults, plugin/MCP setup, and managed/user
constraints in one place while letting a named profile override only the
pieces that need to differ.
## What Changed
- Added the shared `--profile-v2 <name>` runtime option with validated
plain names, now represented by `ProfileV2Name`.
- Extended config layer state so the base user config and selected
profile config are both `User` layers; APIs expose the active user layer
and merged effective user config.
- Threaded profile selection through runtime entry points: `codex`,
`codex exec`, `codex review`, `codex resume`, `codex fork`, and `codex
debug prompt-input`.
- Made user-facing config writes go to the selected profile file when
active, including TUI/settings persistence, app-server config writes,
and MCP/app tool approval persistence.
- Made plugin, marketplace, MCP, hooks, and config reload paths read
from the merged user config so base and profile layers both participate.
- Updated app-server config layer schemas to mark profile-backed user
layers.
## Limits
`--profile-v2` is still rejected for config-management subcommands such
as feature, MCP, and marketplace edits. Those paths remain tied to the
base `config.toml` until they have explicit profile-selection semantics.
Some adjacent background writes may still update base or global state
rather than the selected profile:
- marketplace auto-upgrade metadata
- automatic MCP dependency installs from skills
- remote plugin sync or uninstall config edits
- personality migration marker/default writes
## Verification
Added targeted coverage for profile name validation, layer
ordering/merging, selected-profile writes, app-server config writes,
session hot reload, plugin config merging, hooks/config fixture updates,
and MCP/app approval persistence.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- run registered TurnItemContributor hooks for parsed stream output
items
- plumb the active turn extension store into stream item handling
- preserve existing memory citation parsing as fallback after
contributors run
## Tests
- cargo test -p codex-core stream_events_utils -- --nocapture
- just fmt
- just fix -p codex-core
- git diff --check
## Why
`codex_tools::ToolExecutor` keeps a tool spec attached to its runtime
handler, but extension tools still carried a parallel
`ExtensionToolFuture` / `ExtensionToolExecutor` shape. That made
extension-owned tools look different from host tools even though
routing, registration, and execution need the same abstraction.
This PR makes the shared executor contract directly async and lets
extension tools implement it too, so host tools and extension tools can
move through the same registration path.
## What changed
- Changed `ToolExecutor::handle` to an `async fn` using `async-trait`,
and updated built-in tool handlers to implement the async trait
directly.
- Replaced the bespoke `ExtensionToolFuture` contract with a marker
`ExtensionToolExecutor` over `ToolExecutor<ToolCall, Output =
JsonToolOutput>`, re-exporting `ToolExecutor` from
`codex-extension-api`.
- Updated the memories extension tools to implement the shared executor
trait.
- Split tool-router construction into collected executors plus hosted
model specs, keeping hosted tools like web search and image generation
separate from executable handlers.
- Updated spec/router tests and extension-tool stubs for the new
executor shape.
## Verification
- Not run locally.
## Why
This is a follow-up to #22573. This problem was surfaced in a code
review comment that I missed before merging the previous PR.
Fresh-session startup could prepare a model-availability NUX before
`app_server.start_thread(&config)` completed. If thread startup then
failed, the TUI never rendered the tooltip, but
`prepare_startup_tooltip_override(...)` had already persisted one of the
limited impressions.
## What Changed
- Move startup tooltip preparation inside the fresh-thread startup
branch, after `start_thread(...)` succeeds.
- Keep resume/fork paths unchanged.
- Remove the now-redundant
`should_prepare_startup_tooltip_override(...)` helper and its gate test.
## Summary
- Allow remote installed-plugin cache refresh to start whenever plugins
are enabled.
- Allow remote installed-plugin bundle sync to start whenever plugins
are enabled.
- Remove the extra local `remote_plugin_enabled` guard from those
background sync paths.
## Context
Server-side installed plugin state and optional bundle URL behavior are
owned by plugin-service `/public/plugins/installed`, so these local sync
paths only need the overall plugin enablement gate.
## Test plan
- `just fmt`
- `cargo test -p codex-core-plugins`
## Why
The TUI startup test surface had drifted into expensive, brittle
coverage:
- `tui/tests/suite/no_panic_on_startup.rs` was already ignored as flaky
while still spawning a PTY to exercise malformed exec-policy rules.
- `tui/tests/suite/model_availability_nux.rs` used a seeded session,
cursor-query spoofing, and repeated interrupts to verify a narrow
resume-path invariant.
- `app/tests.rs` had started accumulating unrelated startup and summary
coverage in one flat module even after the surrounding app code was
split into feature modules.
This keeps those behaviors covered while making the tests cheaper to
understand and less likely to rot. It also preserves the malformed-rules
regression from #8803 without requiring a terminal orchestration test.
## What changed
- Replaced the malformed `rules` startup PTY case with a direct
exec-policy loader regression:
[`rules_path_file_returns_read_dir_error`](21b6b5622f/codex-rs/core/src/exec_policy_tests.rs (L264-L284))
- Made the existing fresh-session-only startup tooltip behavior explicit
with
[`should_prepare_startup_tooltip_override`](21b6b5622f/codex-rs/tui/src/app/thread_routing.rs (L1272-L1279)),
then added focused coverage for the resume/fork gate and the persisted
NUX counter.
- Split startup and session-summary coverage out of
`tui/src/app/tests.rs` into dedicated modules so the test layout better
mirrors the current app architecture.
- Converted one single-message goal validation snapshot into semantic
assertions where layout was not the behavior under test.
- Removed the two PTY-heavy suite files that the narrower tests now
supersede.
## Verification
- `cargo test -p codex-core rules_path_file_returns_read_dir_error`
- `cargo test -p codex-tui startup_`
- `cargo test -p codex-tui session_summary_`
- `cargo test -p codex-tui
goal_slash_command_rejects_oversized_objective`
## Why
reapplies https://github.com/openai/codex/pull/22386 which was
previously reverted
Also, introduce `remoteControl/enable` and `remoteControl/disable`
app-server APIs to toggle on/off remote control at runtime for a given
running app-server instance.
## What Changed
- Adds experimental v2 RPCs:
- `remoteControl/enable`
- `remoteControl/disable`
- Adds `RemoteControlRequestProcessor` and routes the new RPCs through
it instead of `ConfigRequestProcessor`.
- Adds named `RemoteControlHandle::enable`, `disable`, and `status`
methods.
- Makes `remoteControl/enable` return an error when sqlite state DB is
unavailable, while keeping enrollment/websocket failures as async status
updates.
- Adds `AppServerRuntimeOptions.remote_control_enabled` and hidden
`--remote-control` flags for `codex app-server` and `codex-app-server`.
- Updates managed daemon startup to use `codex app-server
--remote-control --listen unix://`.
- Marks `Feature::RemoteControl` as removed and ignores
`[features].remote_control`.
- Updates app-server README entries for the new remote-control methods.
## Why
`codex remote-control` manages the app-server daemon with
`remote_control` enabled, but it previously only exposed an implicit
start path. Once started, there was no obvious top-level
`remote-control` command for stopping the daemon; users had to know
about the lower-level `codex app-server daemon stop` command.
The startup failure for missing managed installs was also ambiguous.
`codex remote-control` and daemon bootstrap require the standalone Codex
install under `CODEX_HOME/packages/standalone/current/codex`, but the
old error only said to install Codex first, which is unclear when
another `codex` binary is already on PATH. Now we add an explicit
instruction for how to get the standalone Codex install.
## What changed
- Converts `codex remote-control` into a command group while preserving
bare `codex remote-control` as the existing start behavior.
- Adds `codex remote-control start` as the explicit start path.
- Adds `codex remote-control stop`, which maps to app-server daemon
stop.
- Updates the shared daemon managed-install error to name the missing
standalone path, explain why that install is required, provide the
installer command, and tell users to rerun the command they just tried.
## Verification
- `cargo test -p codex-app-server-daemon`
- `cargo test -p codex-cli`
- `./target/debug/codex remote-control --help`
## Summary
Get rid of the `experimental_use_freeform_apply_patch` config option,
since it is now encoded in model config. No deprecation message since it
has been experimental this entire time.
## Testing
- [x] Updated unit tests
---------
Co-authored-by: Codex <noreply@openai.com>
All apps must be able to open the db to proceed -- codex is having
issues with manufacturing new installation ids in local mode when the db
can't be opened for race conditions or any other reasons.
Remove unnecessary prefix filtering from codex
## Test Plan
Test local cli build + make sure backend returns appropriate apps
```
cd ~/code/codex/codex-rs
cargo build -p codex-cli --bin codex
./target/debug/codex
```
Appropriate apps show up in my list
## Summary
- Upload unsigned macOS release binaries before signing so they remain
available from the workflow run if signing fails
- Add a manual `workflow_dispatch` option, `sign_macos`, defaulting to
`true`
- When `sign_macos=false`, skip macOS signing, signed-name macOS
artifacts, DMGs, npm/DotSlash/PyPI publishing, latest release marking,
and `latest-alpha-cli` updates
## Process
HAVE NOT TESTED YET BUT we should be able to run
```
gh workflow run rust-release.yml \
-R openai/codex \
--ref rust-v0.132.0 \
-f sign_macos=false
```
which will then start the rust-release script with `sign_macos` and
therefore do not codesign mac and also no release afterward.
## Why
`chatwidget.rs` is still carrying too many unrelated responsibilities in
one file. #22269 started a five-phase cleanup to move coherent behavior
domains into focused modules while keeping `chatwidget.rs` as the
composition layer. #22407 completed phase 2 by extracting input and
submission flow, #22433 completed phase 3 by extracting protocol,
replay, streaming, and tool lifecycle handling, and #22518 completed
phase 4 by extracting settings, popups, and status surfaces.
This PR is phase 5. It cleans up the remaining constructor and
orchestration code now that the larger behavior domains have moved out,
leaving `chatwidget.rs` much closer to the composition layer the cleanup
was aiming for. This is once again a mechanical movement of existing
functions. No functional changes.
## What Changed
- Added focused modules for widget construction and initial wiring,
session configuration flow, key/composer interaction routing, review
popup orchestration, desktop notification coalescing, and render
composition.
- Moved the remaining constructor, session setup, interaction,
notification, review picker, and rendering helpers out of
`codex-rs/tui/src/chatwidget.rs`.
- Preserved the existing startup/session behavior, keyboard handling,
review picker flow, notification priority behavior, and render
composition while shrinking the central widget module substantially.
- Left `codex-rs/tui/src/chatwidget.rs` as the registration and
composition surface for the extracted behavior modules.
## Cleanup Phases
The five-phase cleanup plan from #22269 is:
1. Phase 1: mechanical helper and state moves. Completed in #22269.
2. Phase 2: extract input and submission flow, including queued user
messages, shell prompt submission, pending steer restoration, and thread
input snapshot/restore behavior. Completed in #22407.
3. Phase 3: extract protocol, replay, streaming, and tool lifecycle
handling, while preserving active-cell grouping, transcript
invalidation, interrupt deferral, and final-message separator behavior.
Completed in #22433.
4. Phase 4: extract settings, popups, and status surfaces, including
model/reasoning/collaboration/personality popups, permission prompts,
rate-limit UI, and connectors helpers. Completed in #22518.
5. Phase 5: clean up the remaining constructor and orchestration code
once the larger behavior domains have moved out, leaving `chatwidget.rs`
as the composition layer. This PR.
## Verification
- `cargo check -p codex-tui`
- `cargo test -p codex-tui chatwidget::tests::popups_and_settings`
- `cargo test -p codex-tui chatwidget::tests::plan_mode`
- `cargo test -p codex-tui chatwidget::tests::review_mode`
- `cargo test -p codex-tui chatwidget::tests::status_and_layout`
`cargo test -p codex-tui` also compiles and begins running, but aborts
in the unchanged app-side test
`app::tests::discard_side_thread_keeps_local_state_when_server_close_fails`
with the same reproducible stack overflow noted in phase 4.
## Summary
`/collab` was intentionally removed in
[#12012](https://github.com/openai/codex/pull/12012), but the
TUI/app-server migration accidentally brought that slash-command path
back. This restores the earlier product decision so the TUI no longer
advertises or dispatches `/collab`. This command was redundant because
it did the same thing as `/plan` but in a less-intuitive way.
## What Changed
- Remove `SlashCommand::Collab` from the TUI slash-command surface.
- Delete the picker and app-event plumbing that only existed to service
`/collab`.
- Remove obsolete TUI test coverage for the deleted picker flow.
# Why
`PreToolUse.additionalContext` became model-visible after #20692, but
the hook-output spilling path from #21069 never picked up that newer
lane. As a result, oversized `PreToolUse` context could bypass the
truncation/spill treatment that already applies to the other hook
outputs Codex forwards to the model.
# What
- Run `PreToolUseOutcome.additional_contexts` through
`maybe_spill_texts(...)`
- Add an integration test proving a large `PreToolUse.additionalContext`
is replaced with a truncated preview plus spill-file pointer, while the
full text is preserved on disk.
## Why
`multi_agent_v2` already allowed configuring the minimum `wait_agent`
timeout, but the default timeout and upper bound were still hard-coded.
That made it hard to tune waits for subagent mailbox activity in
sessions that need either faster wakeups or longer waits, and it meant
the model-visible `wait_agent` schema could not fully reflect the
resolved runtime limits.
## What Changed
- Added `features.multi_agent_v2.max_wait_timeout_ms` and
`features.multi_agent_v2.default_wait_timeout_ms` alongside the existing
`min_wait_timeout_ms` setting.
- Validated all three timeouts in config as `0..=3_600_000`, with
`min_wait_timeout_ms <= default_wait_timeout_ms <= max_wait_timeout_ms`.
- Thread and review session tool config now passes the resolved
min/default/max values into the `wait_agent` tool schema.
- `wait_agent` now uses the configured default when `timeout_ms` is
omitted and rejects explicit values outside the configured min/max range
instead of silently clamping them.
- Updated the generated config schema and config-lock test coverage for
the new fields.
## Why
On Windows, elevated sandboxed commands run under a dedicated sandbox
account while `HOME` / `USERPROFILE` can still point at the real user's
profile directory. For PowerShell login shells, that combination can
make the sandbox account try to load the real user's PowerShell profile
script. If the sandbox account's execution policy differs from the real
user's policy, startup can emit profile-loading errors before the
requested command runs.
For this backend, loading the profile is not a faithful user login
shell: it is cross-account profile execution. Treating these PowerShell
invocations as non-login shells avoids that invalid startup path.
## Why This Happens Late
The normal `login` decision is resolved when shell argv is created, but
that point is too early to make this Windows sandbox-specific decision.
At argv creation time we do not yet know the actual sandbox attempt that
will run the command. A turn can include sandboxed and unsandboxed
attempts, and a broad turn-level override would also affect Full Access
commands where the user's profile should remain available.
Instead, this change carries the selected `ShellType` alongside the argv
and applies the `-NoProfile` adjustment in the shell runtimes once the
`SandboxAttempt` is known. That keeps the override scoped to actual
`WindowsRestrictedToken` attempts with `WindowsSandboxLevel::Elevated`.
The runtime uses the selected shell metadata rather than re-detecting
PowerShell from argv. That avoids brittle parsing and covers PowerShell
invocation shapes such as `-EncodedCommand`.
## What Changed
- Carry selected shell metadata through `exec_command` / unified exec
requests and shell tool requests.
- Insert `-NoProfile` for PowerShell commands only when the runtime is
about to execute a sandboxed elevated Windows attempt.
- Add focused unit coverage for elevated Windows PowerShell,
`-EncodedCommand`, existing `-NoProfile`, legacy restricted-token
attempts, unsandboxed attempts, and non-PowerShell commands.
## Verification
- `cargo test -p codex-core disable_powershell_profile_tests`
- `cargo test -p codex-core test_get_command`
- `cargo clippy --fix --tests --allow-dirty --allow-no-vcs -p
codex-core`
A full `cargo test -p codex-core` run was also attempted during
development, but it still hit an unrelated stack overflow in
`agent::control` tests before reaching this area.
## Why
Users and support need a single command that captures the local Codex
runtime, configuration, auth, terminal, network, and state shape without
asking the user to know which diagnostic depth to choose first. `codex
doctor` now runs the useful checks by default and makes the detailed
human output the default because the command is usually run when someone
already needs context.
The command also targets concrete support failure modes we have seen
while iterating on the design:
- update-target mismatches like #21956, where the installed package
manager target can differ from the running executable
- terminal and multiplexer issues that depend on `TERM`, tmux/zellij
state, color handling, and TTY metadata
- provider-specific HTTP/WebSocket connectivity, including ChatGPT
WebSocket handshakes and API-key/provider endpoint reachability
- local state/log SQLite integrity problems and large rollout
directories
- feedback reports that need an attached, redacted diagnostic snapshot
without asking the user to run a second command
## What Changed
- Adds `codex doctor` as a grouped CLI diagnostic report with default
detailed output and `--summary` for the compact view.
- Adds stable report sections for Environment, Configuration, Updates,
Connectivity, and Background Server, plus a top Notes block that
promotes anomalies such as available updates, large rollout directories,
optional MCP issues, and mixed auth signals.
- Adds runtime provenance, install consistency, bundled/system search
readiness, terminal/multiplexer metadata, `config.toml` parse status,
auth mode details, sandbox details, feature flag summaries, update
cache/latest-version state, app-server daemon state, SQLite integrity
checks, rollout statistics, and provider-aware network diagnostics.
- Adds ChatGPT WebSocket diagnostics that report the negotiated HTTP
upgrade as `HTTP 101 Switching Protocols` and include timeout, DNS,
auth, and provider context in detailed output.
- Makes reachability provider-aware: API-key OpenAI setups check the API
endpoint, ChatGPT auth checks the ChatGPT path, and custom/AWS/local
providers check configured HTTP endpoints when available.
- Adds structured, redacted JSON output where `checks` is keyed by check
id and `details` is a key/value object for support tooling.
- Integrates doctor with feedback uploads by attaching a best-effort
`codex-doctor-report.json` report and adding derived Sentry tags for
overall status and failing/warning checks.
- Updates the TUI feedback consent copy so users can see that the doctor
report is included when logs/diagnostics are uploaded.
- Updates the CLI bug issue template to ask reporters for `codex doctor
--json` and render pasted reports as JSON.
## Example Output
The examples below are sanitized from local smoke runs with `--no-color`
so the structure is reviewable in plain text.
### `codex doctor`
```text
Codex Doctor v0.0.0 · macos-aarch64
Notes
↑ updates 0.130.0 available (current 0.0.0, dismissed 0.128.0)
⚠ rollouts 1,526 active files · 2.53 GB on disk
⚠ mcp MCP configuration has optional issues
⚠ auth mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
─────────────────────────────────────────────────────────────
Environment
✓ runtime local debug build
version 0.0.0
install method other
commit unknown
executable ~/code/codex.fcoury-doct…x-rs/target/debug/codex
✓ install consistent
context other
managed by npm: no · bun: no · package root —
PATH entries (2) ~/.local/share/mise/installs/node/24/bin/codex
~/.local/share/mise/shims/codex
✓ search ripgrep 15.1.0 (system, `rg`)
✓ terminal Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
terminal Ghostty
TERM_PROGRAM ghostty
terminal version 1.3.2-main-+b0f827665
TERM xterm-256color
multiplexer tmux 3.6a
tmux extended-keys on
tmux allow-passthrough on
tmux set-clipboard on
✓ state databases healthy
CODEX_HOME ~/.codex (dir)
state DB ~/.codex/state_5.sqlite (file) · integrity ok
log DB ~/.codex/logs_2.sqlite (file) · integrity ok
active rollouts 1,526 files · 2.53 GB (avg 1.70 MB)
archived rollouts 8 files · 3.84 MB (avg 491.11 KB)
Configuration
✓ config loaded
model gpt-5.5 · openai
cwd ~/code/codex.fcoury-doctor/codex-rs
config.toml ~/.codex/config.toml
config.toml parse ok
MCP servers 1
feature flags 36 enabled · 7 overridden (full list with --all)
overrides code_mode, code_mode_only, memories, chronicle, goals, remote_control, prevent_idle_sleep
✓ auth auth is configured
auth storage mode File
auth file ~/.codex/auth.json
auth env vars present OPENAI_API_KEY
stored auth mode chatgpt
stored API key false
stored ChatGPT tokens true
stored agent identity false
⚠ mcp MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
configured servers 1
disabled servers 0
streamable_http servers 1
optional reachability openaiDeveloperDocs: https://developers.openai.com/mcp (HEAD connect failed; GET connect failed)
✓ sandbox restricted fs + restricted network · approval OnRequest
approval policy OnRequest
filesystem sandbox restricted
network sandbox restricted
Connectivity
✓ network network-related environment looks readable
✓ websocket connected (HTTP 101 Switching Protocols) · 15s timeout
model provider openai
provider name OpenAI
wire API responses
supports websockets true
connect timeout 15000 ms
auth mode chatgpt
endpoint wss://chatgpt.com/backend-api/<redacted>
DNS 2 IPv4, 2 IPv6, first IPv6
handshake result HTTP 101 Switching Protocols
✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
reachability mode API key auth
openai API https://api.openai.com/v1 connect failed (required)
Background Server
○ app-server not running (ephemeral mode)
─────────────────────────────────────────────────────────────
11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed
--summary compact output --all expand truncated lists
--json redacted report
```
### `codex doctor --summary`
```text
Codex Doctor v0.0.0 · macos-aarch64
Notes
↑ updates 0.130.0 available (current 0.0.0, dismissed 0.128.0)
⚠ rollouts 1,526 active files · 2.53 GB on disk
⚠ mcp MCP configuration has optional issues
⚠ auth mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
─────────────────────────────────────────────────────────────
Environment
✓ runtime local debug build
✓ install consistent
✓ search ripgrep 15.1.0 (system, `rg`)
✓ terminal Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
✓ state databases healthy
Configuration
✓ config loaded
✓ auth auth is configured
⚠ mcp MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
✓ sandbox restricted fs + restricted network · approval OnRequest
Updates
✓ updates update configuration is locally consistent
Connectivity
✓ network network-related environment looks readable
✓ websocket connected (HTTP 101 Switching Protocols) · 15s timeout
✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
Background Server
○ app-server not running (ephemeral mode)
─────────────────────────────────────────────────────────────
11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed
Run codex doctor without --summary for detailed diagnostics.
--all expand truncated lists --json redacted report
```
### `codex doctor --json` shape
```json
{
"schema_version": 1,
"overall_status": "fail",
"checks": {
"runtime.provenance": {
"id": "runtime.provenance",
"category": "Environment",
"status": "ok",
"summary": "local debug build",
"details": {
"version": "0.0.0",
"install method": "other",
"commit": "unknown"
}
},
"sandbox.helpers": {
"id": "sandbox.helpers",
"category": "Configuration",
"status": "ok",
"summary": "restricted fs + restricted network · approval OnRequest",
"details": {
"approval policy": "OnRequest",
"filesystem sandbox": "restricted",
"network sandbox": "restricted"
}
}
}
}
```
### `/feedback` new sentry attachment
<img width="938" height="798" alt="CleanShot 2026-05-13 at 15 36 14"
src="https://github.com/user-attachments/assets/715e62e0-d7b4-4fea-a35a-fd5d5d33c4c0"
/>
### New section in CLI issue template
<img width="1164" height="435" alt="CleanShot 2026-05-13 at 15 47 24"
src="https://github.com/user-attachments/assets/9081dc25-a28c-4afa-8ba1-e299c2b4031d"
/>
## How to Test
1. Run `cargo run --bin codex -- doctor --no-color`.
2. Confirm the detailed report is the default and includes promoted
Notes, grouped sections, terminal details, state DB integrity, rollout
stats, provider reachability, WebSocket diagnostics, and app-server
status.
3. Run `cargo run --bin codex -- doctor --summary --no-color`.
4. Confirm the compact view keeps the same sections and summary counts
but omits detailed key/value rows.
5. Run `cargo run --bin codex -- doctor --json`.
6. Confirm the output is redacted JSON, `checks` is an object keyed by
check id, and each check's `details` is a key/value object.
7. Preview the CLI bug issue template and confirm the `Codex doctor
report` field appears after the terminal field, asks for `codex doctor
--json`, and renders pasted output as JSON.
8. Start a feedback flow that includes logs.
9. Confirm the upload consent copy lists `codex-doctor-report.json`
alongside the log attachments.
Targeted tests:
- `cargo test -p codex-cli doctor`
- `cargo test -p codex-app-server
doctor_report_tags_summarize_status_counts`
- `cargo test -p codex-feedback`
- `cargo test -p codex-tui feedback_view`
- `just argument-comment-lint`
- `git diff --check`
Fixes#20587, reported by @noeljackson.
This prevents the TUI wrapping code from panicking when `textwrap`
returns a borrowed slice that does not point into the original source
text. The fix follows the direction proposed by @misrtjakub in the issue
comment: validate the borrowed slice pointer range first, and fall back
to the existing owned-line mapper when the slice is external.
- Guards borrowed wrapped slices before converting pointer offsets into
byte ranges.
- Reuses the existing owned-line range recovery path for external
borrowed slices.
- Adds coverage for rejecting borrowed slices outside the source text.
End-user testing steps:
- Start Codex in TUI mode under a PTY wrapper that can inject stdin
after startup.
- Inject `\x1b[200~test message\x1b[201~\r` after the TUI is ready.
- Confirm Codex does not panic and the pasted text is handled normally.
Local validation:
- `cargo test -p codex-tui wrapping::tests::`
- `cargo test -p codex-tui -- --skip
status::tests::status_permissions_full_disk_managed_with_network_is_danger_full_access
--skip
status::tests::status_permissions_full_disk_managed_without_network_is_external_sandbox`
This switches TUI plugin mentions to use app-server `plugin/list` for
plugin inventory and metadata instead of `PluginManager`, while keeping
the same mention-eligibility filters as before.
Same filters as before:
- Only plugins in the current config / cwd scope.
- Only installed and enabled plugins.
- Only plugins that actually expose a capability, meaning at least one
skill, MCP server, or app connector.
- Uses `plugin/list` for the mention names/descriptions
# Why
Plugin-bundled hooks are already wired through the plugin manager,
session setup, and app-server hook listing paths. Keeping `plugin_hooks`
disabled by default means users still need an explicit feature opt-in
before that existing behavior participates in normal plugin loading.
# What
- mark `plugin_hooks` as stable and enable it by default
- add feature-registry test coverage for the new default/stage pairing
Validation:
- `cargo test -p codex-features`
- `just fmt`
## Summary
- Add a deterministic callback-id path segment to local MCP OAuth
redirect URIs before starting authorization.
- Derive the callback id from the normalized MCP server URL and encode
it as a 12-character URL-safe hash.
- Reuse the existing exact callback-path validation so OAuth completion
only succeeds on the callback path that was sent in the redirect URI.
## Context
Slack thread:
https://openai.slack.com/archives/C087WB3AGCR/p1777480566571699
That thread calls out the OAuth mix-up class of issue for MCP servers.
The connector/App Connect flow already has a callback_id concept that
binds the OAuth callback URL to the MCP app/server identity. Codex
desktop's local MCP OAuth flow was still using a generic local callback
path like `/callback`, so this PR adds the same shape to the shared
local MCP OAuth helper.
## Behavior
Before this change, local MCP OAuth used:
- default local callback URL: `http://127.0.0.1:<port>/callback`
- configured callback URL: `<configured callback URL>` unchanged
After this change, Codex appends a deterministic callback-id segment:
- default local callback URL:
`http://127.0.0.1:<port>/callback/<callback_id>`
- configured callback URL: `<configured callback path>/<callback_id>`
The local callback server already compares the incoming request path
against the path from the redirect URI. By appending the callback id
before both authorization and callback validation, callbacks that arrive
on the old generic path or a mismatched callback-id path are rejected.
The callback id is bound to the MCP endpoint URL, including path and
query, so path-based multi-tenant MCP deployments on the same origin do
not share a callback path. URL fragments are ignored because they are
not sent to the server.
The change lives in `codex-rmcp-client`, so it covers both the normal
desktop MCP OAuth login path and silent/plugin-triggered MCP OAuth login
paths that use the same `perform_oauth_login_*` helpers.
## Scope and non-goals
- This does not change the app-server protocol or desktop webview
request shape.
- This does not implement RFC 9207 `iss` validation; issuer validation
is still useful when providers return `iss`.
- This does not make arbitrary untrusted MCP servers safe to use. It
specifically adds callback URL binding for the local MCP OAuth flow.
## Validation
- `cargo fmt --all`
- `cargo test -p codex-rmcp-client perform_oauth_login`
## Why
`TurnContext::cwd` is deprecated in favor of resolving paths from the
selected turn environment cwd. A few filesystem-oriented paths were
still constructing sandbox context from the legacy cwd and then mutating
it afterward, or resolving local file paths through the deprecated
helper.
## What changed
- Make `TurnContext::file_system_sandbox_context` take the trusted cwd
explicitly.
- Pass the selected turn environment cwd directly from `apply_patch` and
`view_image` call sites.
- Restrict `spawn_agents_on_csv` to exactly one local environment and
resolve input/output CSV paths from that local environment cwd.
- Remove a redundant test setup assignment that only synchronized
deprecated `TurnContext::cwd` with a replaced config.
## Validation
- `cargo test -p codex-core view_image`
- `cargo test -p codex-core
maybe_persist_mcp_tool_approval_writes_project_config_for_project_server`
- `cargo test -p codex-core parse_csv_supports_quotes_and_commas`
- `git diff --check`
## Summary
- split the single PR-blocking Bazel Windows test leg into four Windows
shard jobs
- preserve the existing required Windows Bazel check name with a
lightweight aggregate gate
- keep Linux/macOS Bazel test jobs and the separate Windows
clippy/release jobs unchanged
## Why
The ordinary PR Windows Bazel test leg was one GitHub Actions job, so
Bazel only had in-job parallelism. This gives that lane real job-level
fanout across separate Windows hosts while keeping the target set
disjoint via stable label hashing.
## Evidence
- final pre-rebase green run: `25774733562`
- Windows shard target counts: `61/212`, `48/212`, `52/212`, `51/212`
- Windows test fanout completed in about 7m29s versus a recent
monolithic median around 22m26s
## Notes
- this is scoped to the Bazel Windows test leg only
- each shard keeps the existing Windows cross-compile/RBE path and
restores the former monolithic Windows test cache
- shard jobs do not upload duplicate repository caches after test work,
keeping cache cleanup off the PR-blocking shard path
- no local validation run; relying on GitHub Actions for the
workflow-shaped check
Co-authored-by: Codex <noreply@openai.com>
## Why
Remote control starts by letting `codex-backend` initialize against the
app-server as an infrastructure health/proxy client before the real
remote client connects. App-server initialization also sets the
process-wide `originator` from `client_info.name`, so `codex-backend`
could become the sticky originator for later model/API requests even
after the real client initialized.
## What changed
- Treat `codex-backend` as a non-originating initialize client,
alongside the existing `codex_app_server_daemon` probe client.
- Preserve normal per-connection initialize behavior, including session
metadata and initialize analytics.
- Add regression coverage that verifies `codex-backend` initialize does
not replace the default originator.
## Testing
- `cargo test -p codex-app-server --test all
initialize_codex_backend_does_not_override_originator`
## Summary
It appears this config flag has been broken/a noop for quite some time:
since https://github.com/openai/codex/pull/8850. Let's simplify and get
rid of this.
## Testing
- [x] Updated unit tests
## Why
Elevated Windows sandbox setup currently assumes that the firewall rules
it writes will take effect. On managed Windows hosts, local firewall
policy changes can be ignored or only partially apply across the active
profiles, which means setup can appear to succeed without providing the
expected network isolation.
## What changed
- Query `INetFwPolicy2::LocalPolicyModifyState` before configuring the
elevated sandbox firewall rules.
- Fail setup when Windows reports that local firewall policy edits are
ineffective or only apply to some current profiles.
- Surface that condition with a dedicated
`helper_firewall_policy_ineffective` setup error code so support and
IT-facing diagnostics can distinguish it from COM access failures.
- Add focused coverage for effective policy, group-policy override, and
partial-profile coverage cases.
## Testing
- `cargo test -p codex-windows-sandbox --bin
codex-windows-sandbox-setup`
## Why
`chatwidget.rs` is still carrying too many unrelated responsibilities in
one file. #22269 started a five-phase cleanup to move coherent behavior
domains into focused modules while keeping `chatwidget.rs` as the
composition layer. #22407 completed phase 2 by extracting input and
submission flow, and #22433 completed phase 3 by extracting protocol,
replay, streaming, and tool lifecycle handling.
This PR is phase 4. It keeps moving high-churn UI coordination out of
the central widget by extracting settings, popups, and status surfaces
without changing the visible behavior those flows already provide. This
is once again a mechanical movement of existing functions. No functional
changes.
## What Changed
- Added focused modules for runtime settings/model coordination,
model/reasoning/collaboration popups,
settings/personality/theme/audio/experimental popups, permission
prompts, status setup/output controls, and Windows sandbox prompt flows.
- Moved the remaining rate-limit nudge/status helpers and connectors
popup/loading/update helpers into their existing focused modules.
- Preserved the existing picker flows, approval behavior, status/title
setup previews, rate-limit notices, and connectors/app list behavior
while shrinking `chatwidget.rs` back toward orchestration.
- Left `codex-rs/tui/src/chatwidget.rs` as the registration and
composition surface for these extracted behaviors.
## Cleanup Phases
The five-phase cleanup plan from #22269 is:
1. Phase 1: mechanical helper and state moves. Completed in #22269.
2. Phase 2: extract input and submission flow, including queued user
messages, shell prompt submission, pending steer restoration, and thread
input snapshot/restore behavior. Completed in #22407.
3. Phase 3: extract protocol, replay, streaming, and tool lifecycle
handling, while preserving active-cell grouping, transcript
invalidation, interrupt deferral, and final-message separator behavior.
Completed in #22433.
4. Phase 4: extract settings, popups, and status surfaces, including
model/reasoning/collaboration/personality popups, permission prompts,
rate-limit UI, and connectors helpers. This PR.
5. Phase 5: clean up the remaining constructor and orchestration code
once the larger behavior domains have moved out, leaving `chatwidget.rs`
as the composition layer.
## Verification
- `cargo check -p codex-tui`
- `cargo test -p codex-tui chatwidget::tests::permissions`
- `cargo test -p codex-tui chatwidget::tests::status_surface_previews`
- `cargo test -p codex-tui chatwidget::tests::popups_and_settings`
- `cargo test -p codex-tui chatwidget::tests::status_and_layout`
`cargo test -p codex-tui` also compiles and begins running, but aborts
in the unchanged app-side test
`app::tests::discard_side_thread_keeps_local_state_when_server_close_fails`
with a reproducible stack overflow.
## Why
`TurnContext::cwd` and `TurnContext::resolve_path` are being phased out
in favor of using the selected turn environment cwd directly.
Deprecating both APIs makes any new direct dependency visible while
preserving the existing migration path for current callers.
## What Changed
- Marked `TurnContext::cwd` and `TurnContext::resolve_path` as
deprecated with guidance to use the selected turn environment cwd
instead.
- Added exact `#[allow(deprecated)]` suppressions at each existing
direct usage site, including tests, rather than adding crate-wide
suppression.
- Kept the change behavior-preserving: current cwd reads, writes, and
path resolution continue to use the same values.
## Verification
- `just fmt`
- `cargo check -p codex-core`
- `cargo check -p codex-core --tests`
- `git diff --check`
# Description
We need to set the appropriate Product SKU for full functionality for
the apps endpoints for each type of client
# Testing
`./target/debug/codex --enable app`
<img width="1786" height="398" alt="CleanShot 2026-05-12 at 11 51 25@2x"
src="https://github.com/user-attachments/assets/2142f768-fc72-4fcb-8f39-9bd0d8569170"
/>
Regular slack flows seem to work, also curling these endpoints with the
correct SKU returns the right apps
## Why
`code_mode_only` filters code-mode nested tools out of the top-level
tool list. For multi-agent v2, we need a rollout shape where the
collaboration tools remain callable as normal model tools without also
being embedded into the code-mode `exec` tool declaration.
Related to this:
https://openai-corpws.slack.com/archives/C0AQLHB4U75/p1778660267922549
## What Changed
- Adds `features.multi_agent_v2.non_code_mode_only`, including config
resolution, profile override handling, and generated schema coverage.
- Introduces `ToolExposure::DirectModelOnly` so a tool can be included
in the initial model-visible list while staying out of the nested
code-mode tool surface.
- Applies that exposure to the multi-agent v2 tools when the new flag is
set: `spawn_agent`, `send_message`, `followup_task`, `wait_agent`,
`close_agent`, and `list_agents`.
- Updates code-mode-only filtering so direct-model-only tools remain
visible while ordinary nested code-mode tools are still hidden.
## Verification
- Added config parsing/profile tests for `non_code_mode_only`.
- Added tool spec coverage for the code-mode-only multi-agent v2
exposure behavior.
## Why
Recent session history showed no active use of the raw `shell`,
`local_shell`, or `container.exec` execution surfaces. Keeping those
handlers/specs wired into core leaves duplicate shell execution paths
alongside the supported `shell_command` and unified exec tools.
## What changed
- Removed the raw `shell` handler/spec and its `ShellToolCallParams`
protocol helper.
- Removed the legacy `local_shell` and `container.exec` handler/spec
plumbing while preserving persisted-history compatibility for old
response items.
- Normalized model/config `default` and `local` shell selections to
`shell_command`.
- Pruned tests that exercised removed raw-shell/local-shell/apply-patch
variants and kept coverage on `shell_command`, unified exec, and
freeform `apply_patch`.
## Verification
- `git diff --check`
- `cargo test -p codex-protocol`
- `cargo test -p codex-tools`
- `cargo test -p codex-core tools::handlers::shell`
- `cargo test -p codex-core tools::spec`
- `cargo test -p codex-core tools::router`
- `cargo test -p codex-core
active_call_preserves_triggering_command_context`
- `cargo test -p codex-core guardian_tests`
- `cargo test -p codex-core --test all shell_serialization`
- `cargo test -p codex-core --test all apply_patch_cli`
- `cargo test -p codex-core --test all shell_command_`
- `cargo test -p codex-core --test all local_shell`
- `cargo test -p codex-core --test all otel::`
- `cargo test -p codex-core --test all hooks::`
- `just fix -p codex-core`
- `just fix -p codex-tools`
## Why
Stop sending duplicate `session_id`/`thread_id` headers. We only want
the hyphenated forms as `_` is rejected by some proxies
Related discussion here:
https://openai.slack.com/archives/C095U48JNL9/p1778508316923179
## What
- Keep `session-id` and `thread-id`
- Remove the underscore aliases
## Why
Deferred tools were tracked with separate side-channel filtering after
tool specs had already been assembled. That made the registry
responsible for executing tools while the router/spec planner separately
decided whether those same tools should be exposed to the model up
front.
This PR makes exposure part of the tool handler contract so direct
versus deferred availability travels with the executable tool
registration.
Next step will be to simplify registration
## What Changed
- Adds `ToolExposure` to `codex-tools` and exposes it through
`ToolExecutor`, defaulting tools to `Direct`.
- Teaches dynamic tools and MCP handlers to mark deferred tools as
`Deferred` at construction time.
- Renames the registry object-safe wrapper from `AnyToolHandler` to
`RegisteredTool` and uses `ToolExposure` when deciding whether to
include a handler's spec in the initial model-visible tool list.
- Refactors tool spec planning to derive direct specs and deferred
search entries from registered handlers, removing the router's
special-case deferred dynamic tool filtering.
## Verification
- Not run.
## Why
Codex intentionally ignores unknown `config.toml` fields by default so
older and newer config files keep working across versions. That leniency
also makes typo detection hard because misspelled or misplaced keys
disappear silently.
This change adds an opt-in strict config mode so users and tooling can
fail fast on unrecognized config fields without changing the default
permissive behavior.
This feature is possible because `serde_ignored` exposes the exact
signal Codex needs: it lets Codex run ordinary Serde deserialization
while recording fields Serde would otherwise ignore. That avoids
requiring `#[serde(deny_unknown_fields)]` across every config type and
keeps strict validation opt-in around the existing config model.
## What Changed
### Added strict config validation
- Added `serde_ignored`-based validation for `ConfigToml` in
`codex-rs/config/src/strict_config.rs`.
- Combined `serde_ignored` with `serde_path_to_error` so strict mode
preserves typed config error paths while also collecting fields Serde
would otherwise ignore.
- Added strict-mode validation for unknown `[features]` keys, including
keys that would otherwise be accepted by `FeaturesToml`'s flattened
boolean map.
- Kept typed config errors ahead of ignored-field reporting, so
malformed known fields are reported before unknown-field diagnostics.
- Added source-range diagnostics for top-level and nested unknown config
fields, including non-file managed preference source names.
### Kept parsing single-pass per source
- Reworked file and managed-config loading so strict validation reuses
the already parsed `TomlValue` for that source.
- For actual config files and managed config strings, the loader now
reads once, parses once, and validates that same parsed value instead of
deserializing multiple times.
- Validated `-c` / `--config` override layers with the same
base-directory context used for normal relative-path resolution, so
unknown override keys are still reported when another override contains
a relative path.
### Scoped `--strict-config` to config-heavy entry points
- Added support for `--strict-config` on the main config-loading entry
points where it is most useful:
- `codex`
- `codex resume`
- `codex fork`
- `codex exec`
- `codex review`
- `codex mcp-server`
- `codex app-server` when running the server itself
- the standalone `codex-app-server` binary
- the standalone `codex-exec` binary
- Commands outside that set now reject `--strict-config` early with
targeted errors instead of accepting it everywhere through shared CLI
plumbing.
- `codex app-server` subcommands such as `proxy`, `daemon`, and
`generate-*` are intentionally excluded from the first rollout.
- When app-server strict mode sees invalid config, app-server exits with
the config error instead of logging a warning and continuing with
defaults.
- Introduced a dedicated `ReviewCommand` wrapper in `codex-rs/cli`
instead of extending shared `ReviewArgs`, so `--strict-config` stays on
the outer config-loading command surface and does not become part of the
reusable review payload used by `codex exec review`.
### Coverage
- Added tests for top-level and nested unknown config fields, unknown
`[features]` keys, typed-error precedence, source-location reporting,
and non-file managed preference source names.
- Added CLI coverage showing invalid `--enable`, invalid `--disable`,
and unknown `-c` overrides still error when `--strict-config` is
present, including compound-looking feature names such as
`multi_agent_v2.subagent_usage_hint_text`.
- Added integration coverage showing both `codex app-server
--strict-config` and standalone `codex-app-server --strict-config` exit
with an error for unknown config fields instead of starting with
fallback defaults.
- Added coverage showing unsupported command surfaces reject
`--strict-config` with explicit errors.
## Example Usage
Run Codex with strict config validation enabled:
```shell
codex --strict-config
```
Strict config mode is also available on the supported config-heavy
subcommands:
```shell
codex --strict-config exec "explain this repository"
codex review --strict-config --uncommitted
codex mcp-server --strict-config
codex app-server --strict-config --listen off
codex-app-server --strict-config --listen off
```
For example, if `~/.codex/config.toml` contains a typo in a key name:
```toml
model = "gpt-5"
approval_polic = "on-request"
```
then `codex --strict-config` reports the misspelled key instead of
silently ignoring it. The path is shortened to `~` here for readability:
```text
$ codex --strict-config
Error loading config.toml:
~/.codex/config.toml:2:1: unknown configuration field `approval_polic`
|
2 | approval_polic = "on-request"
| ^^^^^^^^^^^^^^
```
Without `--strict-config`, Codex keeps the existing permissive behavior
and ignores the unknown key.
Strict config mode also validates ad-hoc `-c` / `--config` overrides:
```text
$ codex --strict-config -c foo=bar
Error: unknown configuration field `foo` in -c/--config override
$ codex --strict-config -c features.foo=true
Error: unknown configuration field `features.foo` in -c/--config override
```
Invalid feature toggles are rejected too, including values that look
like nested config paths:
```text
$ codex --strict-config --enable does_not_exist
Error: Unknown feature flag: does_not_exist
$ codex --strict-config --disable does_not_exist
Error: Unknown feature flag: does_not_exist
$ codex --strict-config --enable multi_agent_v2.subagent_usage_hint_text
Error: Unknown feature flag: multi_agent_v2.subagent_usage_hint_text
```
Unsupported commands reject the flag explicitly:
```text
$ codex --strict-config cloud list
Error: `--strict-config` is not supported for `codex cloud`
```
## Verification
The `codex-cli` `strict_config` tests cover invalid `--enable`, invalid
`--disable`, the compound `multi_agent_v2.subagent_usage_hint_text`
case, unknown `-c` overrides, app-server strict startup failure through
`codex app-server`, and rejection for unsupported commands such as
`codex cloud`, `codex mcp`, `codex remote-control`, and `codex
app-server proxy`.
The config and config-loader tests cover unknown top-level fields,
unknown nested fields, unknown `[features]` keys, source-location
reporting, non-file managed config sources, and `-c` validation for keys
such as `features.foo`.
The app-server test suite covers standalone `codex-app-server
--strict-config` startup failure for an unknown config field.
## Documentation
The Codex CLI docs on developers.openai.com/codex should mention
`--strict-config` as an opt-in validation mode for supported
config-heavy entry points once this ships.
## Why
`just fmt` should align source formatting without resolving dependencies
or rewriting lockfiles. The Python SDK formatting steps run through
`uv`, so differing local `uv` versions could decide the SDK lock was
stale and mutate `sdk/python/uv.lock` before Ruff ran.
## What
- Add `--frozen` to both Python SDK `uv run ... ruff` commands in the
root `fmt` recipe.
- Update the existing Python SDK artifact workflow guard test so future
changes keep the formatter recipe non-lock-mutating.
## Verification
- `uv run --frozen --project ../sdk/python --extra dev pytest
../sdk/python/tests/test_artifact_workflow_and_binaries.py -q`
## Why
`chatwidget.rs` is still carrying too many unrelated responsibilities in
one file. #22269 started a five-phase cleanup to move coherent behavior
domains into focused modules while keeping `chatwidget.rs` as the
composition layer. #22407 completed phase 2 by extracting input and
submission flow.
This PR is phase 3. It keeps moving high-churn event handling out of the
central widget by extracting protocol, replay, streaming, and tool
lifecycle handling without changing the visible behavior those flows
already provide. This is once again just a mechanical movement of
existing functions. No functional changes.
## What Changed
- Added focused modules for protocol request dispatch, replay rendering,
assistant/plan/reasoning streaming, turn runtime bookkeeping, hook
lifecycle handling, command lifecycle handling, tool lifecycle
rendering, and interactive tool request prompts.
- Kept active-cell grouping, transcript invalidation, interrupt
deferral, and final-message separator behavior in the same flows, just
moved into smaller files.
- Added module header comments to the new files so the ownership
boundaries are explicit.
- Left `codex-rs/tui/src/chatwidget.rs` as the registration and
orchestration surface for these extracted behaviors.
## Cleanup Phases
The five-phase cleanup plan from #22269 is:
1. Phase 1: mechanical helper and state moves. Completed in #22269.
2. Phase 2: extract input and submission flow, including queued user
messages, shell prompt submission, pending steer restoration, and thread
input snapshot/restore behavior. Completed in #22407.
3. Phase 3: extract protocol, replay, streaming, and tool lifecycle
handling, while preserving active-cell grouping, transcript
invalidation, interrupt deferral, and final-message separator behavior.
This PR.
4. Phase 4: extract settings, popups, and status surfaces, including
model/reasoning/collaboration/personality popups, permission prompts,
rate-limit UI, and connectors helpers.
5. Phase 5: clean up the remaining constructor and orchestration code
once the larger behavior domains have moved out, leaving `chatwidget.rs`
as the composition layer.
## Why
The memories extension has several distinct responsibilities:
registering its prompt and tool contributors, enforcing local-memory
filesystem boundaries, implementing list/read/search behavior, and
wrapping that backend as extension tools. Those responsibilities were
concentrated in `lib.rs`, `local.rs`, and the tool modules, which made
follow-up work harder to review and risked growing files through
unrelated edits.
This PR reorganizes the crate so each responsibility has a narrower
owner while preserving the same extension entrypoint and memory tool
behavior.
## What Changed
- Moved extension lifecycle, prompt, and tool registration into
`src/extension.rs`, leaving `src/lib.rs` as the small crate entrypoint.
- Split `LocalMemoriesBackend` helpers into `local/list.rs`,
`local/path.rs`, `local/read.rs`, and `local/search.rs`.
- Centralized tool names and limits at the crate level, and kept the
backend and extension implementation crate-private.
- Made `memory_list`, `memory_read`, and `memory_search` tool executors
generic over `MemoriesBackend`, so tests can exercise the full executor
path without depending on tool internals.
- Consolidated and expanded memory extension tests in `src/tests.rs`,
including read/search tool output coverage, multi-query search, windowed
`all_within_lines`, and legacy `query` rejection.
## Testing
- Not run locally.
## Why
Picker-style UI in the TUI has accumulated a mix of hardcoded navigation
keys. Some lists supported page movement, some did not; some accepted
Vim-like keys, while others only accepted arrows; and tabbed or
horizontally adjustable pickers had no shared keymap action for
left/right movement.
This PR makes picker/list navigation consistent and configurable so
users can rely on the same defaults across the TUI.
## What Changed
- Adds shared list keymap actions for:
- vertical movement: `move_up`, `move_down`
- horizontal movement: `move_left`, `move_right`
- paging and jumps: `page_up`, `page_down`, `jump_top`, `jump_bottom`
- Adds defaults:
- Up/down: arrows, `Ctrl+P/N`, `Ctrl+K/J`, and plain `k/j` where text
input is not active
- Page up/down: `PageUp/PageDown` and `Ctrl+B/F`
- First/last: `Home/End`
- Left/right: `Left/Right` and `Ctrl+H/L`
- Wires the shared list keymap through picker and list surfaces
including session resume, multi-select, tabbed selection lists,
settings-style lists, app-link selection, MCP elicitation,
request-user-input, and the OSS selection wizard.
- Keeps search behavior intact by reserving printable characters for
query text in searchable pickers.
- Updates keymap setup actions, config schema, snapshots, and focused
coverage for the new list actions.
## How to Test
1. Start Codex from this branch and open the session picker, for example
with an existing session history.
2. In the session list, verify that `Ctrl+J/K` moves the selection
down/up.
3. Verify that `Ctrl+F/B` pages down/up and `Home/End` jumps to the
first/last visible session.
4. Type printable search text such as `j` or `k` and confirm it updates
the query instead of navigating.
5. Focus a picker control that changes values horizontally, such as a
session picker toolbar control, and verify `Ctrl+H/L` changes the
focused value like left/right arrows.
Targeted tests run:
- `cargo test -p codex-tui keymap::tests::`
- `cargo test -p codex-tui keymap_setup::tests::`
- `cargo test -p codex-tui horizontal_list_keys`
- `cargo test -p codex-tui page_and_jump_navigation_use_list_keymap`
- `cargo test -p codex-tui ctrl_h_l_move_provider_selection`
- `cargo test -p codex-tui scroll_state::tests`
- `cargo test -p codex-tui
switching_tabs_changes_visible_items_and_clears_search`
- `cargo test -p codex-tui toggle_sort_key_reloads_with_new_sort`
Also ran `just write-config-schema`, `just fmt`, `just fix -p
codex-tui`, `just argument-comment-lint`, and `git diff --check`.
Note: `cargo test -p codex-tui` was attempted and still aborts in the
pre-existing
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all` stack
overflow, which is unrelated to this branch.
## Why
Extensions can observe thread and turn lifecycle events today, but there
was no single host-owned hook for changes to the effective thread
configuration. That makes features that need to react to model,
permission, or tool-suggest updates either depend on individual mutation
paths or risk going stale after runtime config refreshes.
This adds a typed config-change contributor so extension-owned state can
stay synchronized with the effective thread config while the host
remains responsible for deciding when config changed.
## What Changed
- Added `ConfigContributor<C>` to `codex_extension_api`, with
before/after immutable snapshots of the effective config plus
session/thread extension stores.
- Added registry builder/accessor support through `config_contributor`
and `config_contributors`.
- Emits config-change callbacks after committed updates from session
settings, per-turn setting updates, and `refresh_runtime_config`.
- Builds effective config snapshots only when config contributors are
registered, and suppresses no-op callbacks when the before/after
snapshots are equal.
- Added a core session regression test that verifies contributors
observe both model changes and user-layer runtime config changes,
including access to session and thread extension stores.
## Validation
Added `config_change_contributor_observes_effective_config_changes` in
`codex-rs/core/src/session/tests.rs` to cover the new contributor path.
## Why
Spawned agents can already override `model` and `reasoning_effort`, but
they have no equivalent way to opt into a model-supported service tier.
That makes it impossible to preserve or intentionally select tiered
execution behavior when delegating work to a sub-agent, even though the
model catalog already advertises supported `service_tiers`.
## What changed
- Add optional `service_tier` to both legacy and `MultiAgentV2`
`spawn_agent` tool inputs.
- Show each picker-visible model's supported service tier ids and
descriptions in the `spawn_agent` tool guidance.
- Resolve service tier selection after the child agent's effective model
is known.
- Inherit the parent tier when omitted and still supported by the final
child model; otherwise clear it.
- Reject explicit unsupported tier requests with a model-facing error.
- Keep explicit `service_tier` usable on full-history forks, while still
honoring the existing model/reasoning fork restrictions.
- Hide `service_tier` alongside other spawn metadata when
`hide_spawn_agent_metadata` is enabled.
## Verification
Added focused coverage for:
- v1/v2 `spawn_agent` schema exposure for `service_tier`
- tier descriptions in spawn guidance
- hidden-metadata suppression
- explicit supported tier selection
- explicit unknown and unsupported tier rejection
- inherited tier preservation or clearing based on child-model support
- full-history fork acceptance for explicit service tiers in both v1 and
v2
Local Rust tests were not run in this workspace per repo guidance; the
new coverage is included for CI.
## Why
We added Zellij-specific TUI workarounds because older Zellij behavior
did not work with Codex's normal terminal model:
- #8555 made `tui.alternate_screen = "auto"` disable alternate screen in
Zellij so transcript history stayed available.
- #16578 avoided scroll-region operations in Zellij by emitting raw
newlines and using a separate composer styling path.
This PR removes both workarounds because the latest Zellij release
tested locally (`zellij 0.44.1`) works correctly with Codex's standard
TUI behavior: normal alternate-screen handling, redraw, and history
insertion.
## What Changed
- Removed the `InsertHistoryMode::Zellij` path and the Zellij-only
newline scrollback insertion behavior.
- Removed cached `is_zellij` state from the TUI and composer.
- Removed Zellij-specific composer styling, the helper snapshot, and the
`TerminalInfo::is_zellij()` convenience method that only served this
workaround.
- Changed `tui.alternate_screen = "auto"` to use alternate screen for
Zellij too; `--no-alt-screen` and `tui.alternate_screen = "never"` still
preserve the inline mode escape hatch.
- Updated the generated config schema description for
`tui.alternate_screen`.
## How to Test
Manual smoke path used with `zellij 0.44.1`:
1. Build and run this branch inside a Zellij `0.44.1` session with
default config.
2. Start Codex normally and produce enough assistant/tool output to
create scrollback.
3. Confirm the transcript remains readable, the composer renders
normally, and scrolling through terminal history works.
4. Resize the Zellij pane while output exists and confirm the TUI
redraws without duplicated, missing, or stale rows.
5. Compare with `--no-alt-screen` or `-c tui.alternate_screen=never` if
you want to verify the inline fallback still works.
Targeted tests:
- `just write-config-schema`
- `just fmt`
- `just fix -p codex-tui`
- `cargo test -p codex-terminal-detection`
- `cargo test -p codex-tui alternate_screen_auto_uses_alt_screen`
Attempted but did not complete locally:
- `cargo test -p codex-tui` built and ran the new test successfully,
then failed later on unrelated local failures in
`status_permissions_full_disk_managed_*` and a stack overflow in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all`.
## Documentation
No developers.openai.com Codex documentation update is needed for this
revert.
## Summary
- add a scoped level_id to ExtensionData and expose it through
level_id()
- remove thread_id/turn_id parameters from extension contributor inputs
where the scoped ExtensionData already carries that identity
- move turn-scoped extension data onto TurnContext so token usage and
lifecycle contributors can share the same turn store
## Testing
- cargo check -p codex-extension-api -p codex-core --tests
- cargo test -p codex-extension-api
- cargo test -p codex-guardian
- cargo test -p codex-core --lib
record_token_usage_info_notifies_extension_contributors
- cargo test -p codex-core --lib
submission_loop_channel_close_emits_thread_stop_lifecycle
- cargo test -p codex-core --lib
submission_loop_channel_close_aborts_active_turn_before_thread_stop_lifecycle
- just fix -p codex-extension-api
- just fix -p codex-guardian
- just fix -p codex-core
- just fmt
## Note
- Attempted cargo test -p codex-core; it aborted in
agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
with the existing stack overflow before the full suite completed.
## Summary
- Split macOS Rust release builds into a dedicated `build-macos` job
- Attach the `macos-signing` environment only to the macOS signing/build
job
- Keep Linux release builds outside the Apple signing environment while
preserving the existing shared release build steps
## Why
Extensions need a stable place to observe token accounting after Codex
folds model-provider usage into the session's cached `TokenUsageInfo`.
Without a contributor hook, extension-owned features that need last-turn
or cumulative token usage have to duplicate session plumbing or infer
state from client-facing `TokenCount` notifications.
## What changed
- Added `TokenUsageContributor` to `codex-extension-api`, passing
session/thread `ExtensionData`, `ThreadId`, turn id, and the current
`TokenUsageInfo`.
- Added registry builder/storage support for token-usage contributors.
- Invoked registered contributors from
`Session::record_token_usage_info` after the session token cache is
updated and before the client `TokenCount` notification is emitted.
## Testing
- Added `record_token_usage_info_notifies_extension_contributors`,
covering cumulative token usage updates and access to both extension
stores.
## Why
The thread lifecycle contributor hooks from #22476 should observe every
session teardown. The explicit `Op::Shutdown` path already emitted
`on_thread_stop`, but when `submission_loop` exited because its
submission channel closed, it only tore down runtime services. That
meant extensions could miss the thread-stop lifecycle signal on implicit
runtime shutdown.
## What Changed
- Split shared runtime teardown into `shutdown_runtime_services(...)`.
- Split thread-stop lifecycle emission into
`emit_thread_stop_lifecycle(...)`.
- Reused those helpers from both explicit shutdown and the channel-close
shutdown path.
- Tracked whether `Op::Shutdown` was received so the explicit path does
not double-emit lifecycle events after it exits the loop.
- Added a regression test that closes the submission channel and asserts
`ThreadLifecycleContributor::on_thread_stop` runs once with the expected
thread/session stores.
## Testing
- `cargo test -p codex-core
submission_loop_channel_close_emits_thread_stop_lifecycle`
## Why
Extensions can already contribute prompt, tool, turn-item, and
thread-lifecycle behavior, but there was no explicit host-owned hook for
per-turn setup and cleanup. That makes extension-private turn state
awkward: an extension either has to stash it outside the turn lifecycle
or depend on core runtime objects.
This adds a small turn lifecycle boundary. Extensions receive stable
identifiers plus the existing session, thread, and turn `ExtensionData`
stores, while core keeps owning task scheduling, cancellation, and turn
teardown.
## What Changed
- Added `TurnLifecycleContributor` with `on_turn_start`, `on_turn_stop`,
and `on_turn_abort` callbacks in `codex-rs/ext/extension-api`.
- Added typed `TurnStartInput`, `TurnStopInput`, and `TurnAbortInput`
payloads that expose `thread_id`, `turn_id`, `session_store`,
`thread_store`, and `turn_store`.
- Registered and re-exported turn lifecycle contributors through
`ExtensionRegistry` and `ExtensionRegistryBuilder`.
- Wired `Session` to emit turn start, stop, and abort callbacks from the
existing turn/task lifecycle paths.
- Carried the turn-scoped `ExtensionData` through `RunningTask` and
`RemovedTask` so stop/abort callbacks receive the same turn store
created at turn start.
## Verification
- Not run locally.
## Why
Extensions that need thread-scoped state currently only get a start-time
callback. That is enough for seeding stores, but it leaves the host
without a shared extension seam for later thread rehydrate and flush
work as thread ownership evolves. This PR turns that start-only seam
into a host-owned thread lifecycle contributor contract so
extension-private state can stay behind the extension API instead of
leaking extra orchestration through core.
## What changed
- Replaced `ThreadStartContributor` with `ThreadLifecycleContributor`
and added typed lifecycle inputs for thread start, resume, and stop. The
contract lives in
[`contributors/thread_lifecycle.rs`](d0e9211f70/codex-rs/ext/extension-api/src/contributors/thread_lifecycle.rs (L1-L64)).
- Kept the existing start-time behavior intact by routing session
construction through `on_thread_start`.
- Invoked `on_thread_stop` during session shutdown before thread-scoped
extension state is dropped, while isolating contributor failures behind
warning logs.
- Migrated `git-attribution` and `guardian` onto the lifecycle
registration path.
- Renamed the extension registry plumbing from start-specific
contributors to lifecycle-specific contributors.
## Notes
`on_thread_resume` is introduced at the API boundary here so extensions
can target the final lifecycle shape; host resume dispatch can be wired
where that runtime path is finalized.
## Summary
- move `plugin/list` from the shared `config` read queue onto a
dedicated `plugin-list` shared-read queue
- move `plugin/read` onto that same dedicated shared-read queue as well
- keep the existing scheduler behavior unchanged
- allow plugin list/read operations to proceed independently of
config-family writes, accepting temporary stale or transient read errors
during concurrent mutations
## Validation
- `just fmt`
- `cargo test -p codex-app-server-protocol`
## Why
Extension tools were split across two public runtime contracts:
`codex-tool-api` exposed `ToolBundle` plus its own call/spec/error
types, while core native tools used `codex_tools::ToolExecutor`. That
made contributed tool specs and execution behavior easy to drift apart
and added another crate boundary for what should be one executable-tool
seam.
This PR makes `ToolExecutor` the single runtime contract and keeps
extension-specific pinning in `codex-extension-api`.
## Remaining todo
https://github.com/openai/codex/pull/22369/changes#diff-b935ea8245c3ce568a30cff660175fa6390b66b872ae409e1e2e965738250741R5
Either generic `Invocation` or sub-extract the `ToolCall` and clean
`ToolInvocation`
## What changed
- Removed the `codex-tool-api` workspace crate and its dependencies from
core and `codex-extension-api`.
- Made `codex_tools::ToolExecutor` object-safe with `async_trait` so
extension contributors can return a dyn executor.
- Added the extension-facing aliases under
`ext/extension-api/src/contributors/tools.rs`, including
`ExtensionToolExecutor = dyn ToolExecutor<ToolCall, Output =
ExtensionToolOutput>`.
- Changed `ToolContributor::tools` to return extension executors
directly instead of `ToolBundle`s.
- Updated core’s extension tool handler/registry/router path to adapt
those extension executors into the existing native `ToolInvocation`
runtime path.
- Added focused coverage for extension tools being registered,
model-visible, dispatchable, and not replacing built-in tools.
## Verification
- `cargo test -p codex-tools`
- `cargo test -p codex-extension-api`
## Why
Codex still models model-visible tools and executable behavior largely
inside `codex-core`, which makes it harder to evolve the tool system
toward a single reusable abstraction for built-ins, MCP-backed tools,
dynamic tools, and later tools injected from outside core.
This PR takes the next incremental step in that direction by moving the
common execution-facing pieces out of core and separating them from
core-only orchestration. The intent is to let shared tool abstractions
improve in one place, while `codex-core` keeps the parts that are still
inherently host-specific today, such as `ToolInvocation`, dispatch
wiring, and hook integration.
This PR is mostly moving things around. The only interesting piece is
this abstraction:
https://github.com/openai/codex/pull/22359/changes#diff-81af519002548ba51ed102bdaaf77e081d40a1e73a6e5f9b104bbbc96a6f1b3dR13
## What changed
- Added `codex_tools::ToolExecutor<Invocation>` as the shared execution
trait for model-visible tools.
- Moved the reusable execution support types from `codex-core` into
`codex-tools`:
- `FunctionCallError`
- `ToolPayload`
- `ToolOutput`
- Refactored core tool implementations so that execution behavior lives
on `ToolExecutor<ToolInvocation>`, while `ToolHandler` remains the
core-local extension point for hook payloads, telemetry tags, diff
consumers, and other orchestration concerns.
- Kept the registry and dispatch flow behaviorally unchanged while
making the shared/extracted boundary explicit across built-in, MCP,
dynamic, extension-backed, shell, and multi-agent tool handlers.
## Verification
- `cargo test -p codex-tools`
- `just fix -p codex-tools`
- `just fix -p codex-core`
- `cargo test -p codex-core` progressed through the updated tool
surfaces and then hit the existing unrelated multi-agent stack overflow
in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
## Why
`codex-extension-api` needs an approval hook that lets an installed
extension own a rendered approval-review prompt and produce the final
`ReviewDecision`. The prior interceptor stub only exposed a yes/no claim
and did not model the review result itself, which left the host with the
missing half of the control flow.
## What changed
- Replaces `ApprovalInterceptorContributor` with
[`ApprovalReviewContributor`](c49d17531e/codex-rs/ext/extension-api/src/contributors.rs (L43-L55)),
which may claim a rendered prompt and return an async `ReviewDecision`.
- Re-exports the new contributor and future types from `extension-api`.
- Adds registry support through `approval_review_contributor(...)` plus
[`ExtensionRegistry::approval_review(...)`](c49d17531e/codex-rs/ext/extension-api/src/registry.rs (L90-L101)),
which returns the first installed contributor that claims the prompt.
## Summary
- move the `view_image` sandbox filesystem-read unit test onto a
temporary cwd
- keep the turn cwd and selected turn environment cwd aligned inside the
test
- avoid leaving `core/image.png` behind in the repo checkout after the
test runs
## Root cause
The test wrote `image.png` beneath `turn.cwd`, and the shared session
test helper defaults that cwd to the current repo directory when no
override is provided.
## Validation
- `just fmt`
- `cargo test -p codex-core
tools::handlers::view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads`
Adds plugin/share/checkout to turn a shared remote plugin into a local
working copy under ~/plugins/<name>.
Registers the copy in the managed personal marketplace and records the
remote-to-local mapping for later share/save flows.
---------
Co-authored-by: Codex <noreply@openai.com>
# Why
Hook trust happens through the TUI in `/hooks` so it can block
non-interactive use cases. This flag will allow users that are using
codex headlessly to bypass hooks when they want to.
# What
This adds one invocation-scoped escape hatch.
- the CLI flag sets a runtime-only `bypass_hook_trust` override; there
is no durable `config.toml` setting
- hook discovery still respects normal enablement, so explicitly
disabled hooks remain disabled
- we show a `--dangerously-bypass-hook-trust is enabled. Enabled hooks
may run without review for this invocation.` message on startup so
accidental use is visible in both interactive and exec flows
This keeps “enabled” and “trusted” as separate concepts in the normal
path, while giving CI/E2E callers a stable way to opt into the
exceptional path when they already control the hook set.
# Why
Linked worktrees currently load their own project hook declarations, so
the same repo can present different hook definitions depending on which
checkout is active. https://github.com/openai/codex/pull/21762 tried to
share trust by giving matching worktree hooks a shared synthetic key,
but review pointed out that divergent worktree hook definitions would
then fight over one `trusted_hash`.
Instead of introducing a second trust model, this makes linked worktrees
use the root checkout as the single source of truth for project hook
declarations. Worktree-local project config can still diverge for
unrelated settings, but project hooks now keep one real source path and
one trust state per repo.
# What
- Teach project config loading to remember the matching root-checkout
`.codex/` folder for actual linked-worktree project layers.
- Keep ordinary project config sourced from the worktree, but replace
project hook declarations with the root checkout's matching layer before
hook discovery runs, including linked-worktree layers with `.codex/` but
no local `config.toml`.
- Make hook discovery use that authoritative hook folder for both
`hooks.json` and TOML hook source paths, so linked worktrees produce the
same hook key and trust state as the root checkout.
- Cover the linked-worktree path plus regressions for missing worktree
`config.toml` and nested non-worktree project roots.
## Why
`UnavailableDummyTools` kept synthetic placeholder tools alive for
historical tool calls whose backing MCP tool was no longer available.
That path adds stale model-visible tool specs and special routing at the
point where unavailable MCP calls should use ordinary current-tool
handling. This removes the runtime backfill instead of preserving a
second compatibility lane.
## Is it safe to remove?
The unavailable tools were added in #17853 after a CS issue when a
previously-called MCP tool failed to load and was omitted from the CS
spec. Now that we have tool search, I think this is resolved:
- API merges tools from previous TST output into effective tool set so
theyre always in CS spec
- if an MCP tool surfaced by TST later becomes unavailable, the model
can still call it and it will just return model-visible error
- both TST output and function call output are dropped on compaction so
model will not remember old calls to MCP post compaction
## What changed
- Delete unavailable-tool collection, placeholder handler, router/spec
plumbing, and obsolete placeholder coverage.
- Keep `features.unavailable_dummy_tools` as a removed no-op feature
tombstone so existing configs still parse cleanly.
- Add an integration-style `tool_search` regression test showing that a
deferred MCP tool surfaced through `tool_search` still routes through
MCP and returns a model-visible tool-call error rather than `unsupported
call`.
## Verification
- `cargo test -p codex-core tool_search`
## Why
`chatwidget.rs` is still carrying too many unrelated responsibilities in
one file. #22269 started a five-phase effort to move coherent behavior
domains into focused modules while keeping `chatwidget.rs` as the
composition layer.
This PR is phase 2 of that plan. It extracts the input and submission
flow as a mechanical move before the later protocol, popup/status, and
constructor/orchestration phases.
## What Changed
- Added `codex-rs/tui/src/chatwidget/input_flow.rs` for composer input
results, queued user-message draining, pending-input previews, and
mode-specific submission entry points.
- Added `codex-rs/tui/src/chatwidget/input_submission.rs` for
user-message construction/submission, shell prompt submission,
structured mention resolution, and blocked image draft restoration.
- Added `codex-rs/tui/src/chatwidget/input_restore.rs` for
initial-message submission, pending steer restoration after interrupts,
and thread input snapshot/restore behavior.
- Registered the new modules and removed the moved `ChatWidget` impl
methods from `codex-rs/tui/src/chatwidget.rs`.
## Follow-On Refactor Phases
The five-phase plan from #22269 is:
- Phase 1: mechanical helper and state moves. Completed in #22269.
- Phase 2: extract input and submission flow, including queued user
messages, shell prompt submission, pending steer restoration, and thread
input snapshot/restore behavior. This PR.
- Phase 3: extract protocol, replay, streaming, and tool lifecycle
handling, while preserving active-cell grouping, transcript
invalidation, interrupt deferral, and final-message separator behavior.
- Phase 4: extract settings, popups, and status surfaces, including
model/reasoning/collaboration/personality popups, permission prompts,
rate-limit UI, and connectors helpers.
- Phase 5: clean up the remaining constructor and orchestration code
once the larger behavior domains have moved out, leaving `chatwidget.rs`
as the composition layer.
## Why
Added support for UDS connections in `codex --remote`.
TUI also now connects to local app-server using UDS by default if it is
running and set to listen to UDS connection.
## What Changed
- Introduced `RemoteAppServerEndpoint` with `WebSocket` and `UnixSocket`
variants.
- Reused the existing JSON-RPC-over-WebSocket protocol over either a TCP
WebSocket stream or a UDS stream.
- Updated `codex --remote` to accept `ws://host:port`,
`wss://host:port`, `unix://`, and `unix://PATH`.
- Kept `--remote-auth-token-env` restricted to `wss://` and loopback
`ws://` remotes.
- Added a fast TUI startup probe for the default daemon socket, falling
back to the embedded app server when the daemon is absent or
unresponsive.
## Verification
- Manually verified that the updated remote flow works.
- Added coverage for UDS remote round trips, WebSocket auth headers,
auth-token transport policy, remote address parsing, and missing-daemon
fallback.
- Ran focused remote test coverage locally.
- Keep shared-with-me as the plugin/list request kind, but return
private plugins under workspace-shared-with-me-private.
- Add workspace-shared-with-me-unlisted for installed workspace plugins
with UNLISTED discoverability,
## Why
This builds on the handler-owned spec refactor by moving deferred
tool-search metadata to the same handlers that already own tool specs.
The registry builder no longer needs a separate prebuilt
`tool_search_entries` path; it can collect searchable entries from
deferred handlers directly.
## What changed
- Added `search_info()` to tool handlers and implemented it for MCP and
dynamic handlers.
- Reused handler `spec()` output when constructing tool-search entries,
adapting it into the deferred `LoadableToolSpec` shape expected by
`tool_search`.
- Simplified `build_tool_registry_builder(...)` so `tool_search`
registration is based on deferred handlers with search info.
- Removed the old standalone search-entry builders and now-unused
`codex-tools` discovery helper exports.
## Verification
- `cargo test -p codex-core tools::handlers::tool_search::tests:: --
--nocapture`
- `cargo test -p codex-core tools::spec_plan::tests::search_tool --
--nocapture`
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-tools`
- `just fix -p codex-core`
- `just fix -p codex-tools`
## Why
Code mode already builds the merged nested `ToolSpec`s that feed the
`exec` prompt. Keeping a separate `tool_namespaces` map in the planning
path duplicated that metadata and left extra wrapper plumbing in
`spec.rs`.
## What changed
- derive code-mode namespace descriptions from the merged
`ToolSpec::Namespace` entries before building the code-mode handlers
- extract `build_code_mode_handlers(...)` so the code-mode-specific
planning stays in one place
- remove `tool_namespaces` from `ToolRegistryBuildParams`
- delete the now-unused `McpToolPlanInputs` wrapper and related test
helper plumbing
## Testing
- `cargo test -p codex-core spec_plan`
## Why
`CODEX_RS_SSE_FIXTURE` let integration-style CLI, exec, and TUI tests
bypass the normal Responses transport by reading SSE from local files.
That kept test-only behavior wired through production client code. The
affected tests can stay hermetic by using the existing
`core_test_support::responses` mock server and passing `openai_base_url`
instead.
## What Changed
- Removed the `CODEX_RS_SSE_FIXTURE` flag,
`codex_api::stream_from_fixture`, the `env-flags` dependency, and the
checked-in SSE fixture files.
- Repointed the affected core, exec, and TUI tests at `MockServer` with
the existing SSE event constructors.
- Removed the Bazel test data plumbing for the deleted fixtures and
refreshed cargo/Bazel lock state.
## Verification
- `cargo build -p codex-cli`
- `cargo test -p codex-api`
- `cargo test -p codex-core --test all responses_api_stream_cli`
- `cargo test -p codex-core --test all
integration_creates_and_checks_session_file`
- `cargo test -p codex-exec --test all ephemeral`
- `cargo test -p codex-exec --test all resume`
- `cargo test -p codex-tui --test all
resume_startup_does_not_consume_model_availability_nux_count`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fix -p codex-api -p codex-core -p codex-exec -p codex-tui`
- `git diff --check`
## Why
Enterprise-managed hook policy needs a narrow way to require Codex to
ignore user-controlled lifecycle hooks without adopting the broader
trust-precedence model from earlier hook work. This keeps the policy
anchored in `requirements.toml`, so admins can opt into managed hooks
only while normal `config.toml` files cannot enable the restriction
themselves.
## What changed
- Added `allow_managed_hooks_only` to the requirements data flow and
preserved explicit `false` values.
- Also adds it to /debug-config
- Marked MDM, system, and legacy managed config layers as managed for
hook discovery.
- Updated hook discovery so `allow_managed_hooks_only = true`:
- keeps managed requirements hooks and managed config-layer hooks,
- skips user/project/session `hooks.json` and `[hooks]` entries with
concise startup warnings,
- skips current unmanaged plugin hooks,
- ignores any `allow_managed_hooks_only` key placed in ordinary
`config.toml` layers.
## Why
hook semantics treat `session_id` as shared across a root session and
its subagents. Codex hooks were still emitting the current thread ID,
which made spawned agents look like independent sessions and made it
harder for hook integrations to correlate work across a root thread and
its spawned helpers
This change makes hooks use Codex's existing shared session identity so
hook `session_id` matches the root-thread session across spawned
subagents.
## What Changed
- switch hook payloads to use the existing shared session identity from
core instead of the current thread ID
- cover all hook surfaces that expose `session_id`, including
`SessionStart`, tool hooks, compact hooks, prompt-submit hooks, stop
hooks, and legacy after-agent dispatch
## Why
Guardian review selection was hard-coded in `core`, which worked for the
default OpenAI path but did not give provider implementations a way to
choose backend-specific reviewer model IDs. That matters for Amazon
Bedrock: guardian review should run through the Bedrock/Mantle provider
using Bedrock's `openai.gpt-5.4` model ID, instead of accidentally
selecting a reviewer model that implies the OpenAI backend.
## What Changed
- Added provider-owned approval review model selection via
`ModelProvider::approval_review_model_selection`.
- Moved the existing default selection policy into the provider
abstraction: prefer the requested reviewer model when it is available,
otherwise fall back to the active turn model, preferring `Low` reasoning
when supported.
- Added an Amazon Bedrock override that pins guardian review to
`openai.gpt-5.4` with `Low` reasoning.
## Why
PR #21843 removed the TCP websocket app-server listener, but that also
removed functionality that still needs to exist. Restoring it as-is
would reopen the old remote exposure problem, so this keeps the restored
listener while making remote and non-loopback usage require explicit
auth.
## What Changed
- Mostly reverts #21843 and reapplies the small merge-conflict
resolutions needed on top of current main.
- Restores ws://IP:PORT parsing, the app-server TCP websocket acceptor,
websocket auth CLI flags, and the associated tests.
- The only intentional behavior change from the restored code is that
non-loopback websocket listeners now fail startup unless --ws-auth
capability-token or --ws-auth signed-bearer-token is configured.
Loopback listeners remain available for local and SSH-forwarding
workflows.
## Reviewer Focus
Please focus review on the small auth-enforcement delta layered on top
of the revert:
- codex-rs/app-server-transport/src/transport/websocket.rs:
start_websocket_acceptor now rejects unauthenticated non-loopback
websocket binds before accepting connections.
- codex-rs/app-server-transport/src/transport/auth.rs: helper logic
classifies unauthenticated non-loopback listeners.
- codex-rs/app-server/tests/suite/v2/connection_handling_websocket.rs:
tests cover unauthenticated ws://0.0.0.0 startup rejection and
authenticated non-loopback capability-token startup.
Everything else is intended to be revert/merge-conflict restoration
rather than new product behavior.
## Verification
- Manually verified that TUI remoting is restored and that auth is
enforced for non-localhost urls.
- Adds localVersion to plugin summaries and remoteVersion to share
context, including generated API schemas.
- Hydrates local and remote plugin versions from manifests and remote
release metadata.
- Adds default-on plugin_sharing gate for shared-with-me listing and
plugin/share/save, with disabled-path errors
and focused coverage.
## Summary
Plugin Creator now documents the shorter local-plugin handoff URL that
the app can interpret directly.
[#22221](https://github.com/openai/codex/pull/22221) teaches the skill
to end marketplace-backed creation flows with named View and Share
links; this follow-up updates those examples so the skill only emits the
normalized plugin name, the absolute marketplace path, and optional
share mode.
The documented shape is:
```txt
codex://plugins/<normalized-plugin-name>?marketplacePath=<absolute-marketplace-json-path>
codex://plugins/<normalized-plugin-name>?marketplacePath=<absolute-marketplace-json-path>&mode=share
```
The skill text now states exactly where the normalized plugin name
belongs, exactly where the absolute marketplace path belongs, and that
it should not add `pluginName` or `hostId` query parameters.
## Testing
Tests: plugin-creator skill validation.
## Why
`remote_control` can appear in `config.toml`, CLI feature overrides, and
the app-server config APIs. Before this PR, app-server startup treated
`config.features.enabled(Feature::RemoteControl)` as the signal to start
remote control ([base
code](5e3ee5eddf/codex-rs/app-server/src/lib.rs (L678-L680))).
That meant a user with:
```toml
[features]
remote_control = true
```
would accidentally opt every app-server process into remote control.
Remote-control startup should instead be a per-process launch decision
made by CLI flags.
## What Changed
- Marks `Feature::RemoteControl` as `Stage::Removed`, keeping
`remote_control` as a known compatibility key while making it
config-inert.
- Adds a hidden `--remote-control` process flag to `codex app-server`
and standalone `codex-app-server`.
- Plumbs that flag through
`AppServerRuntimeOptions.remote_control_enabled` and makes app-server
startup use only that runtime option to decide whether to start remote
control.
- Removes the app-server config mutation hook that reloaded config and
toggled remote control at runtime.
- Updates managed daemon spawning to use `codex app-server
--remote-control --listen unix://` instead of `--enable remote_control`.
Config APIs can still list, read, write, and set `remote_control`; those
operations just no longer affect remote-control process enrollment.
## Why
`tool_search` still carries the server-specific result-cap path added in
#17684 for `computer-use`: when the model omitted `limit`, a matching
result expanded the search to 20 and then `limit_results_by_bucket`
applied per-bucket caps. That makes default result handling depend on a
one-off server exception instead of the single
`TOOL_SEARCH_DEFAULT_LIMIT` path.
This PR removes that custom branch so omitted `limit` values use the
ordinary global default consistently. The implementation being retired
is the pre-change bucketed search path in
[`tool_search.rs`](5e3ee5eddf/codex-rs/core/src/tools/handlers/tool_search.rs (L121-L190)).
## What changed
- Collapse `ToolSearchHandler::search` back to one BM25 search with the
resolved limit.
- Remove `limit_results_by_bucket`, the `computer-use` constants, and
the omitted-limit plumbing that only existed for the override.
- Drop dead `ToolSearchEntry::limit_bucket` metadata from deferred MCP
and dynamic search entries.
- Remove tests and helpers that only asserted the deleted override
behavior.
- Add direct handler-level unit coverage for omitted/default and
explicit `tool_search` result limits.
## Validation
- `cargo test -p codex-core tool_search`
- The matching unit tests passed, including the new omitted/default and
explicit result-limit coverage.
- The broader `--test all` search-tool fixture phase then failed before
sending mocked response requests in
`tool_search_indexes_only_enabled_non_app_mcp_tools` and
`tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`.
- `cargo test -p codex-core`
- The touched tool-search coverage passed before the run later aborted
in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
with a stack overflow.
## Why
`chatwidget.rs` is still carrying too many unrelated responsibilities in
one file. After #21866 consolidated some of the state it tracks, this
starts the next phase by moving coherent state/helper clusters out of
the main module without changing behavior.
This PR is intentionally mechanical: it only moves existing functions,
structs, and helpers into focused modules so the boundaries are easier
to review before the less mechanical refactors that should follow.
## What Changed
- Moved user-message, composer, queue, pending steer, and merge/remap
helpers into `codex-rs/tui/src/chatwidget/user_messages.rs`.
- Added `codex-rs/tui/src/chatwidget/exec_state.rs` for unified exec
bookkeeping helpers.
- Added `codex-rs/tui/src/chatwidget/rate_limits.rs` for rate-limit
warning, prompt, and error classification state.
- Moved plugin list fetch and install auth-flow state into
`codex-rs/tui/src/chatwidget/plugins.rs`.
- Made a couple of test-only `VecDeque` imports explicit now that those
tests no longer inherit the parent module import.
## Verification
- `cargo test -p codex-tui` was run
## Follow-On Refactor Phases
This PR is phase 1: mechanical helper and state moves. Planned follow-up
PRs:
- Phase 2: extract input and submission flow, including queued user
messages, shell prompt submission, pending steer restoration, and thread
input snapshot/restore behavior.
- Phase 3: extract protocol, replay, streaming, and tool lifecycle
handling, while preserving active-cell grouping, transcript
invalidation, interrupt deferral, and final-message separator behavior.
- Phase 4: extract settings, popups, and status surfaces, including
model/reasoning/collaboration/personality popups, permission prompts,
rate-limit UI, and connectors helpers.
- Phase 5: clean up the remaining constructor and orchestration code
once the larger behavior domains have moved out, leaving `chatwidget.rs`
as the composition layer.
- make ThreadStore::update_thread_metadata accept a broad range of
metadata patches
- keep ThreadStore::append_items as raw canonical history append (no
metadata side effects)
- in the local store, write these metadata updates to a combination of
sqlite and rollout jsonl files for backwards-compat. It special cases
which fields need to go into jsonl vs sqlite vs whatever, confining the
awkwardness to just this implementation
- in remote stores we can simply persist the metadata directly to a
database, no special casing required.
- move the "implicit metadata updates triggered by appending rollout
items" from the RolloutRecorder (which is local-threadstore-specific) to
the LiveThread layer above the ThreadStore, inside of a private helper
utility called ThreadMetadataSync. LiveThread calls ThreadStore
append_items and update_metadata separately.
- Add a generic update metadata method to ThreadManager that works on
both live threads and "cold" threads
- Call that ThreadManager method from app server code, so app server
doesn't need to worry about whether the thread is live or not
## Why
`tool_search` already had solid end-to-end coverage for discovery and
follow-up execution, but it did not prove that distinct pieces of
indexed search text actually work in integration. In particular, we were
not exercising whether unique tool names, descriptions, namespaces,
underscore-expanded dynamic names, and schema-property terms were
sufficient to surface the expected deferred tools.
This change adds focused integration coverage for those term sources so
regressions in search text construction are caught by a real `TestCodex`
flow instead of only by lower-level unit tests.
## What changed
- added a small helper in `core/tests/suite/search_tool.rs` to assert
that a `tool_search_output` contains an expected namespace child tool
- added an MCP integration test that issues several `tool_search_call`s
and verifies distinct query terms match the expected app tools:
- exact tool name: `calendar_timezone_option_99`
- tool description phrase: `uploaded document`
- top-level schema property: `starts_at`
- added a dynamic-tool integration test that verifies distinct query
terms match the expected deferred dynamic tool:
- exact name: `quasar_ping_beacon`
- underscore-expanded name: `quasar ping beacon`
- description phrase: `saffron metronome`
- namespace: `orbit_ops`
- schema property: `chrono_spec`
## Validation
- `cargo test -p codex-core tool_search_matches_`
## Docs
No documentation update needed.
## Why
This is the base PR in the split stack for the permissions migration. It
isolates stack-safety work that had been mixed into the larger
permissions PR, so reviewers can evaluate the async-future changes
separately from the permissions model changes in #22267.
The main risk this addresses is large or recursive multi-agent futures
overflowing smaller runner stacks. A follow-up review also called out
that `shutdown_live_agent` must remain quiescent: callers should not
remove a live agent from tracking or release its spawn slot until the
worker loop has actually terminated.
## What Changed
- Boxes the large async futures in the multi-agent spawn, resume, and
close tool handlers.
- Boxes the `AgentControl` spawn and recursive close/shutdown paths that
can otherwise build very deep futures.
- Keeps `shutdown_live_agent` waiting for thread termination before
removing/releasing the live agent, preserving the previous shutdown
ordering while still boxing the recursive close path.
## Verification Strategy
The focused local coverage was `cargo test -p codex-core multi_agents`,
which exercises the multi-agent spawn/resume/close handlers, cascade
close/resume behavior, and the shutdown path touched by this PR.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22266).
* #22330
* #22329
* #22328
* #22327
* __->__ #22266
Introduce execute_to_pending and wait_to_pending APIs that freeze
pending-mode runtimes until an explicit resume, while preserving the
existing continuously-running execute path. Add runtime and service
coverage for pending, resume, completion, and freeze behavior.
## Summary
This refactor makes tool handlers the owner of the specs they can
publish, so registry construction can register handlers once and
separately publish only the specs that should be model-visible.
The main motivation is deferred tools: MCP and dynamic tools still need
handlers registered up front, but deferred tools should be discoverable
through `tool_search` rather than emitted in the initial tool spec list.
## What changed
- `McpHandler` and `DynamicToolHandler` can return their own `ToolSpec`.
- `build_tool_registry_builder` now collects handlers, registers them
through the no-spec path, and publishes only non-deferred handler specs.
- Deferred MCP and dynamic tool names are combined into one
`all_deferred_tools` set that drives spec filtering, code-mode
deferred-tool signaling, and `tool_search` registration.
- `tool_search` registration now requires both deferred tools and
`namespace_tools`.
- Namespace specs are merged in `spec_plan`, preserving top-level spec
order, sorting tools within each namespace, and backfilling empty
namespace descriptions.
- Hosted web search and image-generation specs are included in the
collected spec vector before namespace merge/publication, and tool-name
tests that should not care about hosted relative order now compare sets.
## Testing
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-core
tools::router::tests::specs_filter_deferred_dynamic_tools --
--nocapture`
- `cargo test -p codex-core
suite::prompt_caching::prompt_tools_are_consistent_across_requests --
--nocapture`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core -- --skip
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
passed the library suite after skipping the known stack-overflowing unit
test.
Full `cargo test -p codex-core` currently hits a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`;
the same focused test reproduces on `origin/main`.
## Summary
- make workspace owner nudge handling unconditional in the TUI now that
it is fully rolled out
- keep `workspace_owner_usage_nudge` as a removed no-op compatibility
flag so old configs/app overrides remain accepted during rollout
- remove flag-disabled test setup
## Companion PR
- https://github.com/openai/openai/pull/876351 removes the Codex Apps
Statsig rollout gate override after this change is available to the
app/runtime path
## Validation
- `just write-config-schema`
- `just fmt`
- `cargo test -p codex-features`
- `cargo test -p codex-tui status_and_layout`
## Why
Remote exec-server now needs one executor websocket to serve multiple
harness JSON-RPC sessions. Rendezvous routes by `stream_id`, and the
exec-server side needs to use the same stable relay frame contract
instead of a hand-rolled JSON shape.
The relay protocol also needs to make ownership boundaries clear:
harness and executor endpoints own sequencing, acks, retries, duplicate
suppression, segmentation, and reassembly; rendezvous only routes
frames.
## What Changed
- Add the checked-in `codex.exec_server.relay.v1.RelayMessageFrame`
proto plus generated prost bindings for `codex-exec-server`.
- Encode remote harness/executor relay traffic as binary protobuf
websocket frames while keeping local websocket JSON-RPC unchanged.
- Demux executor-side relay streams into independent
`ConnectionProcessor` sessions keyed by `stream_id`.
- Add a programmatic `RemoteExecutorConfig::with_bearer_token(...)`
constructor for non-CLI callers and integration tests.
- Add an integration test that starts the remote executor against a fake
registry/rendezvous websocket and verifies two virtual streams share one
executor websocket without cross-talk, including per-stream reset
behavior.
- Document the remote relay envelope, sequence ranges, `ack`/`ack_bits`,
and endpoint responsibilities in `exec-server/README.md`.
## Verification
- `cargo test -p codex-exec-server --test relay
multiplexed_remote_executor_routes_independent_virtual_streams --
--exact`
- `cargo test -p codex-exec-server --test relay`
- `cargo test -p codex-exec-server` passed outside the sandbox. The
sandboxed run hit macOS `sandbox-exec: sandbox_apply: Operation not
permitted` in filesystem sandbox tests.
## Why
Windows CI has been timing out in
`configured_pet_load_is_deferred_until_after_construction` while waiting
for the deferred configured-pet load event.
The test still needs to prove construction returns before the pet image
is available, but the background load slices the built-in pet
spritesheet into frame cache files. That work can exceed the old 2
second deadline on slower or more contended CI machines.
## What Changed
- Increased the test wait for `ConfiguredPetLoaded` from 2 seconds to 30
seconds.
- Kept the post-construction assertion intact so the test still verifies
that the pet is not loaded synchronously during `ChatWidget`
construction.
## How to Test
Targeted tests:
- `cargo test -p codex-tui
configured_pet_load_is_deferred_until_after_construction`
- `just argument-comment-lint`
Additional check:
- `cargo test -p codex-tui` was run, but the broader crate suite did not
complete successfully due to unrelated existing failures:
-
`status::tests::status_permissions_full_disk_managed_without_network_is_external_sandbox`
-
`status::tests::status_permissions_full_disk_managed_with_network_is_danger_full_access`
- later abort in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all` from
stack overflow
## Why
Code mode only used nested spec lookup at execution time to rediscover
whether a nested tool should be invoked as a function tool or a freeform
tool.
That information is already present in the enabled tool metadata that
code mode builds to expose `tools.*` and `ALL_TOOLS`, so re-looking it
up from the router was redundant and kept execution coupled to a
separate spec lookup path.
## What Changed
- thread `CodeModeToolKind` through the code-mode runtime `ToolCall`
event and `CodeModeNestedToolCall`
- emit the nested tool kind directly from the V8 callback using the
already-enabled tool metadata
- build nested tool payloads from the propagated kind instead of calling
`find_spec`
- remove the now-unused `find_spec` plumbing from the router and
parallel runtime helpers
- add unit coverage for function vs freeform payload shaping and update
affected router tests
## Testing
- `cargo test -p codex-code-mode`
- `cargo test -p codex-core code_mode::tests`
- `cargo test -p codex-core
extension_tool_bundles_are_model_visible_and_dispatchable`
- `cargo test -p codex-core
model_visible_specs_filter_deferred_dynamic_tools`
## Summary
Adds include_collaboration_mode_instructions, which is a config
equivalent to include_permissions_instructions for collaboration modes.
Desired for situations where we want to disable this instruction from
entering the context
## Testing
- [x] Added unit test
## Why
Tool dispatch had two serialization mechanisms:
- `supports_parallel_tool_calls` decides whether a tool participates in
the shared parallel-execution lock.
- `is_mutating` separately gated some calls inside dispatch.
That second hook no longer carried its weight. The remaining
parallel-support flag is already the per-tool concurrency policy, so
keeping a second mutating gate made dispatch harder to follow and left
behind extra session plumbing that only existed for that path.
## What changed
- Removed `is_mutating` from tool handlers and deleted the
`tool_call_gate` path that existed only to support it.
- Simplified dispatch and routing to rely on the existing per-tool
`supports_parallel_tool_calls` boolean.
- Dropped the now-unused handler overrides and related session/test
scaffolding.
- Kept the router/parallel tests focused on the surviving per-tool
behavior.
- Removed the unused `codex-utils-readiness` dependency from
`codex-core` as a follow-up fix for `cargo shear`.
## Testing
- `cargo test -p codex-core
parallel_support_does_not_match_namespaced_local_tool_names`
- `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
- `cargo test -p codex-core
tools_without_handlers_do_not_support_parallel`
## Summary
- tighten unified exec sandbox initialization
- preserve the requested process workdir independently from sandbox
setup
- add regression coverage for the updated invariant
## Validation
- Ran `/tmp/cargo-tools/bin/just fmt`.
- Ran the targeted `codex-core` regression test successfully.
- Ran `cargo test -p codex-core`; it did not complete cleanly because
unrelated existing agent/config-loader tests failed and the run later
aborted on a stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
Co-authored-by: Codex <noreply@openai.com>
## Why
The Codex App has animated pets, but the TUI had no equivalent ambient
companion surface. This brings that experience into terminal Codex while
keeping the main chat flow usable: the pet should feel present, but it
cannot cover transcript text, composer input, approvals, or picker
content.
The feature also needs to be terminal-aware. Different terminals support
different image protocols, tmux can interfere with image rendering, and
some users will want pets disabled entirely or anchored differently
depending on their layout.
<table>
<tr><td>
<img width="4110" height="2584" alt="CleanShot 2026-05-05 at 12 41
45@2x"
src="https://github.com/user-attachments/assets/68a1fcbc-2104-48d6-b834-69c6aaa95cdf"
/>
<p align="center">macOS - Ghostty, iTerm2 and WezTerm with Custom
Pet</p>
</td></tr>
<tr><td>
![Uploading CleanShot 2026-05-10 at 20.28.30.png…]()
<p align="center">Windows Terminal</p>
</td></tr>
<tr><td>
<img width="3902" height="2752" alt="CleanShot 2026-05-05 at 12 39
02@2x"
src="https://github.com/user-attachments/assets/300e2931-6b00-467e-91cb-ab8e28470500"
/>
<p align="center">Linux - WezTerm and Ghostty</p>
</td></tr>
</table>
## What Changed
- Add a TUI ambient pet renderer in `codex-rs/tui/src/pets/`.
- Port the app-style pet animation states so the sprite changes with
task status, waiting-for-input states, review/ready states, and
failures.
- Add `/pets` selection UI with a preview pane, loading state, built-in
pet choices, and a first-row `Disable terminal pets` option.
- Download built-in pet spritesheets on demand from the same public CDN
path already used by Android, under
`https://persistent.oaistatic.com/codex/pets/v1/...`, and cache them
locally under `~/.codex/cache/tui-pets/`.
- Keep custom pets local.
- Add config support for pet selection, disabling pets, and choosing
whether the pet follows the composer bottom or anchors to the terminal
bottom.
- Reserve layout space around the pet so transcript wrapping, live
responses, and composer input do not render underneath the sprite.
- Gate image rendering by terminal capability, disable image pets under
tmux, and support both Kitty Graphics and SIXEL terminals.
- Add redraw cleanup for terminal image artifacts, including sixel cell
clearing.
## Current Scope
- This is an initial TUI version of ambient pets, not full App parity.
- It focuses on ambient sprite rendering, `/pets` selection, custom
pets, terminal capability gating, and on-demand CDN-backed built-in
assets.
- The ambient text overlay is currently disabled, so the TUI renders the
pet sprite without extra status text beside it.
## How to Test
1. Start Codex TUI in a terminal with image support.
2. Run `/pets`.
3. Confirm the picker shows built-in pets plus custom pets, and the
first item is `Disable terminal pets`.
4. On a fresh `~/.codex/cache/tui-pets/`, move onto a built-in pet and
confirm the first preview downloads the spritesheet from the shared
Codex pets CDN and renders successfully.
5. Move through the pet list and confirm subsequent built-in previews
use the local cache.
6. Select a pet, then send and receive messages. Confirm transcript and
composer text wrap before the pet instead of rendering underneath the
sprite.
7. Change the pet anchor setting and confirm the pet can either follow
the composer bottom or sit at the terminal bottom.
8. Return to `/pets`, choose `Disable terminal pets`, and confirm the
sprite disappears cleanly.
Targeted tests:
- `cargo test -p codex-tui ambient_pet_`
- `cargo test -p codex-tui
resize_reflow_wraps_transcript_early_when_pet_is_enabled`
- `cargo insta pending-snapshots`
Part 1 of guardian as extension. This bind all the logic to spawn
another agent from an extension and it adds `ThreadId` in the start
thread collaborator
Makes plugin summaries use config-style plugin@marketplace IDs while
exposing backend remote IDs separately as remotePluginId.
Also fix the consistency issue of REMOTE_SHARED_WITH_ME_MARKETPLACE_NAME
## Why
The split filesystem policy stack already supports exact and glob
`access = none` read restrictions on macOS and Linux. Windows still
needed subprocess handling for those deny-read policies without claiming
enforcement from a backend that cannot provide it.
## Key finding
The unelevated restricted-token backend cannot safely enforce deny-read
overlays. Its `WRITE_RESTRICTED` token model is authoritative for write
checks, not read denials, so this PR intentionally fails that backend
closed when deny-read overrides are present instead of claiming
unsupported enforcement.
## What changed
This PR adds the Windows deny-read enforcement layer and makes the
backend split explicit:
- Resolves Windows deny-read filesystem policy entries into concrete ACL
targets.
- Preserves exact missing paths so they can be materialized and denied
before an enforceable sandboxed process starts.
- Snapshot-expands existing glob matches into ACL targets for Windows
subprocess enforcement.
- Honors `glob_scan_max_depth` when expanding Windows deny-read globs.
- Plans both the configured lexical path and the canonical target for
existing paths so reparse-point aliases are covered.
- Threads deny-read overrides through the elevated/logon-user Windows
sandbox backend and unified exec.
- Applies elevated deny-read ACLs synchronously before command launch
rather than delegating them to the background read-grant helper.
- Reconciles persistent deny-read ACEs per sandbox principal so policy
changes do not leave stale deny-read ACLs behind.
- Fails closed on the unelevated restricted-token backend when deny-read
overrides are present, because its `WRITE_RESTRICTED` token model is not
authoritative for read denials.
## Landed prerequisites
These prerequisite PRs are already on `main`:
1. #15979 `feat(permissions): add glob deny-read policy support`
2. #18096 `feat(sandbox): add glob deny-read platform enforcement`
3. #17740 `feat(config): support managed deny-read requirements`
This PR targets `main` directly and contains only the Windows deny-read
enforcement layer.
## Implementation notes
- Exact deny-read paths remain enforceable on the elevated path even
when they do not exist yet: Windows materializes the missing path before
applying the deny ACE, so the sandboxed command cannot create and read
it during the same run.
- Existing exact deny paths are preserved lexically until the ACL
planner, which then adds the canonical target as a second ACL target
when needed. That keeps both the configured alias and the resolved
object covered.
- Windows ACLs do not consume Codex glob syntax directly, so glob
deny-read entries are expanded to the concrete matches that exist before
process launch.
- Glob traversal deduplicates directory visits within each pattern walk
to avoid cycles, without collapsing distinct lexical roots that happen
to resolve to the same target.
- Persistent deny-read ACL state is keyed by sandbox principal SID, so
cleanup only removes ACEs owned by the same backend principal.
- Deny-read ACEs are fail-closed on the elevated path: setup aborts if
mandatory deny-read ACL application fails.
- Unelevated restricted-token sessions reject deny-read overrides early
instead of running with a silently unenforceable read policy.
## Verification
- `cargo test -p codex-core
windows_restricted_token_rejects_unreadable_split_carveouts`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-windows-sandbox`
- GitHub Actions rerun is in progress on the pushed head.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`ToolRouter::tool_supports_parallel()` was still consulting configured
specs when a handler lookup missed, even though parallel schedulability
is really a property of the executable handler. Keeping that metadata on
`ConfiguredToolSpec` duplicated state between the model-visible spec
layer and the runtime handler layer.
This change makes handlers the sole source of truth for parallel tool
support and removes the extra spec wrapper that only existed to carry
duplicated metadata.
## What changed
- removed `ConfiguredToolSpec` and store plain `ToolSpec` values in the
registry/router builder path
- changed `ToolRouter::tool_supports_parallel()` to consult only the
handler registry and fall back to `false`
- simplified spec collection and test helpers to operate directly on
`ToolSpec`
- updated router/spec tests to cover handler-owned parallel behavior and
the no-handler fallback
## Validation
- `cargo test -p codex-tools`
- `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
- `cargo test -p codex-core
deferred_responses_api_tool_serializes_with_defer_loading`
- `cargo test -p codex-core
tools_without_handlers_do_not_support_parallel`
- `cargo test -p codex-core
request_plugin_install_can_be_registered_without_search_tool`
## Docs
No documentation updates needed.
## Why
Older sessions can contain model-warning records persisted as `user`
messages, including the unified exec process-limit warning, the
`apply_patch`-via-`exec_command` warning, and the model-mismatch
high-risk cyber fallback warning. Those warnings are no longer produced
as conversation history items, but when old sessions compact they should
still be recognized as injected context rather than preserved as real
user turns.
## What changed
- Removed `record_model_warning` and the production paths that emitted
these warning messages into conversation history.
- Added `LegacyUnifiedExecProcessLimitWarning`,
`LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning`
contextual fragments that are used only for matching old persisted
messages.
- Registered the legacy fragments with contextual user message detection
so compaction filters them through the existing fragment path.
- Added focused compaction coverage for old warning messages being
dropped during compacted-history processing.
## Testing
- `cargo test -p codex-core warning`
- `just fix -p codex-core`
## Why
`PreToolUse` already exposes `updatedInput` in its hook output schema,
but Codex currently rejects it instead of applying the rewrite. That
leaves hook authors unable to make the documented pre-execution
adjustment to a tool call before it runs.
## What
- Accept `updatedInput` from `PreToolUse` hooks when paired with
`permissionDecision: "allow"`.
- Apply the rewritten input before dispatch so the tool executes the
updated payload, not the original one.
- Preserve the stable hook-facing compatibility shapes that
participating tool handlers expose:
- Bash-like tools (`shell`, `container.exec`, `local_shell`,
`shell_command`, `exec_command`) use `{ "command": ... }`.
- `apply_patch` exposes its patch body through the same command-shaped
hook contract.
- MCP tools expose their JSON argument object directly.
- Keep each participating tool handler responsible for translating
hook-facing `updatedInput` back into its concrete invocation shape.
## Verification
Direct Bash-like rewrite coverage:
- `pre_tool_use_rewrites_shell_before_execution`
- `pre_tool_use_rewrites_container_exec_before_execution`
- `pre_tool_use_rewrites_local_shell_before_execution`
- `pre_tool_use_rewrites_shell_command_before_execution`
- `pre_tool_use_rewrites_exec_command_before_execution`
These cases assert that each supported Bash-like surface runs only the
rewritten command while the hook still observes the original `{
"command": ... }` input.
`pre_tool_use_rewrites_apply_patch_before_execution`
- Model emits one patch.
- Hook swaps in a different patch.
- Asserts only the rewritten file is created, and the hook saw the
original patch.
`pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution`
- Model runs one nested shell command from code mode.
- Hook rewrites it.
- Asserts only the rewritten command runs, and the hook saw the original
nested input.
`pre_tool_use_rewrites_mcp_tool_before_execution`
- Model calls the RMCP echo tool.
- Hook rewrites the MCP arguments.
- Asserts the MCP server receives and returns the rewritten message, not
the original one.
## Summary
- create a selected-cwd filesystem sandbox context for view_image
metadata and file reads in both local and remote environments
- add a local restricted-profile regression test for the previously
unsandboxed read path
## Validation
- just fmt
- bazel test --bes_backend= --bes_results_url= --test_output=errors
--test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads
//codex-rs/core:core-unit-tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Plugin creation now defaults to the personal marketplace path and ends
with a readable handoff back into Codex after a marketplace-backed
scaffold.
Before this change, `plugin-creator` centered repo-local marketplace
updates and did not clearly guide the agent to return the user to the
created plugin afterward. This PR updates the bundled system skill so
marketplace-backed scaffolds default to `~/plugins/<plugin-name>` plus
`~/.agents/plugins/marketplace.json`, ask for user intent only when an
existing repo marketplace makes personal vs team scope ambiguous, and
end with named Markdown deeplinks labeled `View <plugin-name>` and
`Share <plugin-name>`.
## What changed
- default marketplace-backed creation to the personal plugin location
- document the explicit repo/team override path for codebases that
should own the plugin entry
- ask personal vs team only when the current Git repo already has
`.agents/plugins/marketplace.json` and the user has not stated scope
- require named Markdown deeplinks after marketplace-backed creation so
the final response returns the user to the exact plugin cleanly
- keep the deeplink targets precise with real absolute `marketplacePath`
and normalized `pluginName` values
- align the bundled prompt, scaffold help text, and marketplace
reference spec with the new default
## Testing
Tests: targeted skill validation, Python compile checks,
personal-default scaffold smoke, repo-override scaffold smoke, and
whitespace checks.
## Why
The MCP tool path had accumulated a few core-owned special cases: a
dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
translation path, and a side channel for parallel-call metadata. That
made `ToolRegistry` and the spec builder know more about MCP than they
needed to.
This change moves MCP-specific execution details back onto `ToolInfo`
and `McpHandler` so `codex-core` can treat MCP calls like normal
function calls while still preserving MCP-specific dispatch and
telemetry behavior where it belongs.
## What changed
- removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
the remaining registry-side MCP resolver path
- stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
including `supports_parallel_tool_calls`
- deleted the legacy `AfterToolUse` consumer in `core`, which removes
the need for handler-specific `after_tool_use_payload` implementations
- switched tool-result telemetry to handler-provided tags and kept
MCP-specific dispatch payload construction inside the handler
- simplified tool spec planning/building by passing `ToolInfo` directly
and dropping the direct/deferred MCP wrapper structs and the
parallel-server side table
## Testing
- `cargo check -p codex-core -p codex-mcp -p codex-otel`
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server`
- `cargo test -p codex-core
direct_mcp_tools_register_namespaced_handlers`
- `cargo test -p codex-core
search_tool_description_lists_each_mcp_source_once`
- `cargo test -p codex-mcp
list_all_tools_uses_startup_snapshot_while_client_is_pending`
- `just fix -p codex-core -p codex-mcp -p codex-otel`
## Why
`codex exec-server` should keep the existing public `ws://IP:PORT` URL
shape while serving that websocket connection through an HTTP upgrade
path internally. That keeps the client-facing configuration simple and
allows the listener to work through intermediate HTTP-aware
infrastructure.
## What changed
- keep the emitted and configured exec-server URL as `ws://IP:PORT`
- serve that websocket endpoint through Axum HTTP upgrade handling on
`/`
- expose `GET /readyz` from the same listener for readiness checks
- route upgraded Axum websocket streams through the shared JSON-RPC
connection machinery
- initialize the rustls crypto provider before websocket client
connections
- preserve inbound binary websocket JSON-RPC parsing for compatibility
with the prior transport behavior
## Verification
- `cargo test -p codex-exec-server --test health --test process --test
websocket --test initialize --test exec_process`
## Why
While investigating `codex exec hi` startup latency, the useful
questions were not "is startup slow?" but "which durable bucket is slow
in production?"
The path we observed has a few distinct stages:
1. `thread/start` creates the session
2. startup prewarm builds the turn context, tools, and prompt
3. startup prewarm warms the websocket
4. the first real turn resolves the prewarm
5. the model produces the first token
Before this PR, production telemetry had some of the raw measurements
already:
- aggregate startup-prewarm duration / age-at-first-turn metrics
- TTFT as a metric
- websocket request telemetry
But there was no coherent production event stream for the startup
breakdown itself, and TTFT was metric-only. That made it hard to answer
the same latency questions from OpenTelemetry-backed logs without adding
one-off local instrumentation.
## What changed
Add durable production telemetry on the existing `SessionTelemetry`
path:
- new `codex.startup_phase` OTel log/trace events plus
`codex.startup.phase.duration_ms`
- new `codex.turn_ttft` OTel log/trace events while preserving the
existing TTFT metric
The startup phase event is emitted for the coarse buckets we actually
observed while running `exec hi`:
- `thread_start_create_thread`
- `startup_prewarm_total`
- `startup_prewarm_create_turn_context`
- `startup_prewarm_build_tools`
- `startup_prewarm_build_prompt`
- `startup_prewarm_websocket_warmup`
- `startup_prewarm_resolve`
These phases are intentionally low-cardinality so they remain safe as
production telemetry tags.
## Why this shape
This keeps the instrumentation on the same production path as the rest
of the session telemetry instead of adding a local debug-only trace
mode. It also avoids changing startup behavior:
- prewarm still runs
- no control flow changes
- no extra remote calls
- no user-visible behavior changes
One boundary is intentional: very early process bootstrap that happens
before a session exists is not included here, because this PR uses
session-scoped production telemetry. The expensive buckets we were
trying to understand after `thread/start` are now covered durably.
## Verification
- `cargo test -p codex-otel`
- `cargo test -p codex-core turn_timing`
- `cargo test -p codex-core
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core
interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted`
- `cargo test -p codex-app-server thread_start`
- `just fix -p codex-otel -p codex-core -p codex-app-server`
I also ran `cargo test -p codex-core`; it built successfully and then
hit an existing unrelated stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
## Summary
- add multi-environment apply_patch routing for both freeform and
function-call tool flows
- parse and reconcile the optional environment selector in the main
apply_patch parser, then verify against the selected environment in the
handler
- carry environment_id through runtime and approval surfaces so
remote-targeted patches stay explicit end to end
## Testing
- just fmt
- remote exec-server e2e: `cargo test -p codex-core --test all
apply_patch_multi_environment_uses_remote_executor -- --nocapture` on
dev via `scripts/test-remote-env.sh`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Update `codex remote-control` to use the new app server daemon commands
instead.
- if the updater loop is not running, bootstrap the daemon with remote
control enabled (`codex app-server daemon bootstrap --remote-control`)
- otherwise, enable the persisted remote-control setting and start the
daemon normally
# Why
Managed hook configs need a shared cross-platform shape without making
the existing `command` field polymorphic. The common case is still one
command string, with Windows needing a different entrypoint only when
the runtime is actually Windows.
Keeping `command` as the portable/default path and adding an optional
Windows override keeps the config easier to read, preserves the existing
scalar shape for non-Windows users, and avoids forcing every caller into
a `{ unix, windows }` object when only one platform needs special
handling.
# What
- Add optional `command_windows` / `commandWindows` alongside the
existing hook `command` field.
- Resolve `command_windows` only on Windows during hook discovery; other
platforms continue to use `command` unchanged.
- Keep trust hashing aligned to the effective command selected for the
current runtime.
# Docs
The Codex hooks/config reference should document `command_windows` as
the Windows-only override for command hooks.
## Why
Review telemetry should describe reviews as first-class events, not only
as counters denormalized onto terminal tool-item events. That lets us
analyze guardian and user reviews consistently across command execution,
file changes, permissions, and network access, while still preserving
the terminal item summaries that existing tool analytics need.
To make those review events accurate, analytics also needs the observed
completion time for each review and enough command metadata to
distinguish `shell` from `unified_exec` reviews.
## What changed
- emit generic `codex_review_event` rows for completed user and guardian
reviews, with review subjects, reviewer, trigger, terminal status,
resolution, and observed duration
- reduce approval request / response / abort facts into review events
for command execution, file change, and permissions flows
- keep denormalized review counts, final approval outcome, and
permission-request flags on terminal tool-item events for
item-associated reviews
- plumb review completion timing so user-review responses and aborts use
app-server-observed completion times, while guardian analytics reuse the
same terminal timestamps emitted on guardian assessment events
- carry command approval `source` through the protocol and app-server
layers so review analytics can distinguish `shell` from `unified_exec`
- add analytics coverage for user-review emission, guardian-review
emission, permission reviews that should not denormalize onto tool
items, item-summary isolation across threads, and the serialized
review-event shape
## Verification
- `cargo test -p codex-analytics`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748).
* __->__ #18748
* #21434
* #18747
* #17090
* #17089
* #20514
## Why
The Python SDK needs the same tight formatter/lint loop as the rest of
the repo: a safe Ruff autofix pass, Ruff formatting, editor save
behavior, and CI checks that catch drift. Without that loop, SDK changes
can land with formatting or import ordering that differs from what
reviewers and CI expect.
## What
- Add Ruff configuration to `sdk/python/pyproject.toml`, excluding
generated protocol code and notebooks from the normal lint/format pass.
- Update `just fmt` so it still formats Rust and also runs Python SDK
Ruff autofix and formatting.
- Add Python SDK CI steps for `ruff check` and `ruff format --check`
before pytest.
- Recommend the Ruff VS Code extension and enable Python
format/fix/organize-on-save so Cmd+S uses the same tooling.
- Apply the resulting Ruff formatting to SDK Python files, examples, and
the checked-in generated `v2_all.py` output emitted by the pinned
generator.
- Add a guard test for the `just fmt` recipe so it keeps working from
both Rust and Python SDK working directories.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. This PR `[8/8]` Add Python SDK Ruff formatting
## Verification
- Added `test_root_fmt_recipe_formats_rust_and_python_sdk` for the
shared format recipe.
- Ran `just fmt` after the recipe update.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The SDK had behavioral tests that replaced SDK client internals. Those
tests could catch wrapper mistakes, but they did not prove the pinned
app-server runtime, generated notification models, request routing, and
sync/async public clients worked together.
This PR adds deterministic integration coverage that starts the pinned
`codex app-server` process and mocks only the upstream Responses HTTP
boundary.
## What
- Add `AppServerHarness` and `MockResponsesServer` helpers for isolated
`CODEX_HOME`, mock-provider config, queued SSE responses, and captured
`/v1/responses` requests.
- Add shared helpers for SSE construction, stream assertions,
approval-policy inspection, and image fixtures.
- Split integration coverage into focused modules for run behavior,
inputs, streaming, turn controls, approvals, and thread lifecycle.
- Cover sync and async `Thread.run`, `TurnHandle.stream`, interleaved
streams, approval-mode persistence, lifecycle helpers, final-answer
phase handling, image inputs, loaded skill input injection, steering,
interruption, listing, history reads, run overrides, and token usage
mapping.
- Replace public-wrapper tests that duplicated integration-test behavior
with lower-level client tests only where direct client behavior is the
thing under test.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. This PR `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- Added pinned app-server integration tests under
`sdk/python/tests/test_app_server_*.py` and
`test_real_app_server_integration.py`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The high-level SDK should expose the approval behavior it actually
supports instead of leaking generated app-server routing fields. New
work should have two clear choices: default auto review, or explicitly
deny escalated permission requests. Existing threads and subsequent
turns should preserve their current approval behavior unless the caller
passes an override.
## What
- Add the public `ApprovalMode` enum with `auto_review` and `deny_all`.
- Default new thread creation to `ApprovalMode.auto_review`.
- Preserve existing approval settings by default for resume, fork, run,
and turn helpers.
- Remove raw `approval_policy` / `approvals_reviewer` kwargs from
high-level SDK wrappers.
- Update generated wrapper output, docs, examples, notebooks, and tests
for the high-level approval mode API.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. This PR `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- Added approval-mode mapping/default tests for new threads, existing
threads, forks, resumes, and subsequent turns.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The SDK should publish under the reserved public distribution name
`openai-codex`, and its import module should match that name in the
Python style. Since package names can contain hyphens but import modules
cannot, the public import path becomes `openai_codex`.
Keeping the rename separate from the public API surface change makes the
naming change easy to review and avoids mixing it with API curation.
## What
- Rename the SDK distribution from `openai-codex-app-server-sdk` to
`openai-codex`.
- Rename the import package from `codex_app_server` to `openai_codex`.
- Keep the runtime wheel as the separate `openai-codex-cli-bin`
dependency.
- Update docs, examples, notebooks, artifact scripts, lockfile metadata,
and tests for the new distribution/module names.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. This PR `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- Updated package metadata and public API tests to assert the
distribution and import names.
Co-authored-by: Codex <noreply@openai.com>
## Why
The SDK package root should be the ergonomic public client API, not a
dump of every generated app-server schema type. Generated models still
need a supported import path, but callers should be able to tell which
names are high-level SDK entrypoints and which names are protocol value
models.
## What
- Define a curated root `__all__` for clients, handles, input helpers,
retry helpers, config, and public errors.
- Add a `types` module as the supported home for generated app-server
response, event, enum, and helper models.
- Update docs and examples to import protocol/value models from the type
module.
- Add tests that lock root exports, type-module exports, star-import
behavior, and example import hygiene.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. This PR `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- Added public API signature tests for root exports, `types` exports,
and example imports.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The Python SDK stack now depends on packaging metadata, pinned runtime
wheels, generated artifacts, async behavior, and stream interleaving.
Those checks need to run in CI so future changes cannot bypass the SDK
test suite.
## What
- Add a dedicated `python-sdk` job to `.github/workflows/sdk.yml`.
- Run the job in `python:3.12-alpine` so dependency resolution exercises
the pinned musl runtime wheel.
- Keep the Python SDK test job parallel to the existing SDK job instead
of serializing the full workflow.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. This PR `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- The added workflow job installs the SDK with `uv sync --extra dev
--frozen` and runs the Python SDK pytest suite.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Once the SDK declares its runtime package, generated Python artifacts
should come from that pinned runtime rather than whatever app-server
schema happens to be in the current checkout. That keeps the generated
API and model surface aligned with the runtime users install.
## What
- Teach `scripts/update_sdk_artifacts.py generate-types` to invoke the
pinned runtime package for schema generation.
- Regenerate `v2_all.py`, `notification_registry.py`, and generated
public wrapper methods from that schema.
- Add freshness coverage so regenerating from the pinned runtime must
leave checked-in artifacts unchanged.
## Stack
1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. This PR `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- Added `test_generated_files_are_up_to_date` for pinned-runtime
generation drift.
- Added generator-structure tests for schema annotation and notification
metadata generation.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The Python SDK depends on the app-server runtime package for the bundled
`codex` binary and schema source of truth. That relationship should be
explicit in package metadata instead of inferred from matching version
numbers, so installers, lockfiles, and reviewers can see exactly which
runtime the SDK expects.
## What
- Declare `openai-codex-cli-bin==0.131.0a4` as a Python SDK dependency.
- Update runtime setup helpers to resolve the runtime version from the
declared dependency pin.
- Refresh the SDK lockfile for the pinned runtime wheel.
- Update package/runtime tests and docs that describe where the runtime
version comes from.
## Stack
1. This PR `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting
## Verification
- Added coverage for the SDK runtime dependency pin and runtime
distribution naming.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The permissions migration is making
`permissions.<profile>.network.enabled` the canonical sandbox network
bit, while proxy startup is a separate concern. Enabling network access
should not implicitly start the proxy, and users who are still on legacy
sandbox modes need a separate place to opt into proxy startup and
provide proxy-specific settings.
This follow-up to #19900 gives the network proxy its own feature surface
instead of overloading permission-profile network semantics.
## What changed
- Add an experimental `network_proxy` feature with a configurable
`[features.network_proxy]` table.
- Overlay `features.network_proxy` settings onto the configured proxy
state after permission-profile selection, so the proxy only starts when
the active `NetworkSandboxPolicy` already allows network access.
- Preserve `[experimental_network]` startup behavior independently of
the new feature flag.
## Behavior and examples
There are now three related knobs:
- `permissions.<profile>.network.enabled` controls whether the active
permission profile has network access at all.
- `features.network_proxy` enables proxy restrictions for an
already-network-enabled profile.
- Legacy `sandbox_mode` plus `[sandbox_workspace_write].network_access`
still control whether legacy `workspace-write` has network access at
all.
The rule is:
- network off + proxy flag on -> network stays off, proxy is a no-op
- network on + proxy flag off -> unrestricted direct network
- network on + proxy flag on -> network stays on, with proxy
restrictions applied
For permission profiles, the feature toggle adds proxy restrictions only
when network access is already enabled:
```toml
default_permissions = "workspace"
[permissions.workspace.filesystem]
":minimal" = "read"
[permissions.workspace.network]
enabled = true
[features]
network_proxy = true
```
If `network.enabled = false`, the same feature flag is a no-op: network
remains off and the proxy does not start.
For legacy sandbox config, `network_access` remains the master switch:
```toml
sandbox_mode = "workspace-write"
[sandbox_workspace_write]
network_access = true
[features]
network_proxy = true
```
That keeps legacy `workspace-write` network access on, but routes it
through the proxy policy. If `network_access = false`, the proxy feature
is a no-op and legacy `workspace-write` remains offline.
The same proxy opt-in can be supplied from the CLI:
```bash
codex -c 'features.network_proxy=true'
```
Additional proxy settings can be supplied when a table is needed:
```bash
codex \
-c 'features.network_proxy.enabled=true' \
-c 'features.network_proxy.enable_socks5=false'
```
The intended behavior matrix is:
| Config surface | Network setting | `features.network_proxy` | Direct
sandbox network | Proxy |
| --- | --- | --- | --- | --- |
| Permission profile | `network.enabled = false` | off | restricted |
off |
| Permission profile | `network.enabled = false` | on | restricted | off
|
| Permission profile | `network.enabled = true` | off | enabled | off |
| Permission profile | `network.enabled = true` | on | enabled | on |
| Legacy `workspace-write` | `network_access = false` | off | restricted
| off |
| Legacy `workspace-write` | `network_access = false` | on | restricted
| off |
| Legacy `workspace-write` | `network_access = true` | off | enabled |
off |
| Legacy `workspace-write` | `network_access = true` | on | enabled | on
|
`[experimental_network]` requirements remain separate from the user
feature toggle and still start the proxy on their own.
Relevant code:
-
[`features/src/feature_configs.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/features/src/feature_configs.rs#L58-L117)
defines the feature-specific proxy config.
-
[`core/src/config/mod.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L1959-L1964)
reads the feature table, and [later applies it only when network access
is already
enabled](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L2448-L2458).
## Verification
Added focused coverage for:
- keeping the proxy off when `features.network_proxy` is enabled but
sandbox network access is disabled
- the full permission-profile and legacy `workspace-write` matrix above
- preserving `[experimental_network]` startup without the feature
- reusing profile-supplied proxy settings when the feature is enabled
Ran:
- `cargo test -p codex-features`
- `cargo test -p codex-core network_proxy_feature`
- `cargo test -p codex-core
experimental_network_requirements_enable_proxy_without_feature`
## Summary
- revoke previously stored managed ChatGPT tokens after a successful
re-login
- keep the new login successful even when revocation is unavailable or
fails
- cover the shared persistence path used by browser and device-code
login flows
## Why
A new `codex login` currently overwrites existing managed ChatGPT
credentials without attempting to revoke the superseded tokens, leaving
old credentials valid longer than necessary.
## Validation
- `just fmt`
- `CARGO_HOME=/tmp/cargo-home cargo test -p codex-login`
## Notes
- Initial local Cargo validation hit a corrupt existing crate cache in
the default `CARGO_HOME`; rerunning with a clean temporary `CARGO_HOME`
passed.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`bootstrap` starts a detached pid-backed updater loop, but before this
change that updater could keep running an old executable image even
after `install.sh` replaced the managed standalone binary under
`CODEX_HOME`. That left the updater itself behind the binary it had just
rolled out, especially when the app-server was stopped or when the
managed binary changed without a version-string change.
## What changed
- Track updater identity from the executable contents rather than only
the reported CLI version.
- Force the managed app-server restart path when the managed binary
contents differ from the running updater image, then re-exec the updater
from the managed binary once the rollout is in a safe state.
- Distinguish a genuinely absent managed app-server from a managed
process that exists but is not yet probeable, so self-refresh does not
skip a required restart.
- Keep the restart/re-exec decision under the daemon operation lock so
`bootstrap` cannot race the handoff.
- Update `app-server-daemon/README.md` to document the resulting
standalone and out-of-band update behavior.
## Verification
- `cargo test -p codex-app-server-daemon`
- `just fix -p codex-app-server-daemon`
Added focused unit coverage for:
- content-based updater refresh decisions
- safe updater re-exec outcomes across restart states
## Summary
Fixes#22128.
The `/keymap` flow already persists the `-` key as `minus`, and the
runtime keymap parser already accepts that spelling. `codex-config` was
the missing leg: it rejected `minus` during config deserialization, so a
binding saved by Codex could fail on the next startup or config reload.
## What Changed
- Accept `minus` as a valid canonical key name in `tui.keymap` config
normalization.
- Update the config validation message so its supported-key list
includes `minus`.
- Add regression coverage that deserializes both `minus` and `alt-minus`
under `[tui.keymap.global]` and verifies the normalized config shape.
## How to Test
1. Start Codex TUI.
2. Run `/keymap`.
3. Assign the `-` key to an action and save the change.
4. Restart Codex or reload the config.
5. Confirm the config loads normally and the saved binding remains
usable instead of failing on `minus`.
6. As a focused regression check, repeat with a modifier form such as
`alt--` captured through `/keymap`, which persists as `alt-minus` and
should also reload successfully.
Targeted tests:
- `cargo test -p codex-config`
## Why
We've added support for auth elicitation behind the auth_elicitation
flag, but servers need to explicitly check the capability before it
decides to send elicitations in order to be backward compatible. This PR
adds the capability advertising conditioned on the flag.
## What changed
- Build `client_elicitation_capability` from the `AuthElicitation`
feature state.
- Thread that capability through MCP config, session startup, and
`McpConnectionManager` so RMCP initialization advertises the correct
elicitation support.
- Advertise both `form` and `url` elicitation when the feature is
enabled, and preserve the empty default capability when it is disabled.
- Add coverage for the feature-derived config shape and the advertised
initialization payload.
## Testing
- `cargo test -p codex-mcp`
- `cargo test -p codex-core
to_mcp_config_preserves_auth_elicitation_feature_from_config`
- `cargo test -p codex-core` *(currently fails outside this change in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
with a stack overflow after unrelated tests have started running)*
## Why
Managed requirements can already centrally disable apps, but they could
not express the per-tool app approval rules that normal config already
supports. That left admins without a way to enforce connector tool
approvals through `/etc/codex/requirements.toml` or cloud requirements.
## What changed
- Extend app requirements with per-tool `approval_mode` entries.
- Merge managed app tool requirements across managed sources while
preserving higher-precedence exact tool settings.
- Apply managed tool approvals separately from user app config so
managed policy is matched only on raw MCP `tool.name`, while user config
keeps the existing raw-name-then-title convenience fallback.
- Add coverage for local requirements, cloud requirements parsing,
managed-over-user precedence, and a title-collision case that must not
widen managed auto-approval.
## Configuration shape
Local `/etc/codex/requirements.toml` and cloud requirements use the same
TOML shape:
```toml
[apps.connector_123123.tools."calendar/list_events"]
approval_mode = "approve"
```
This is a per-tool approval rule keyed by app ID and raw MCP tool name,
not an app-level boolean such as `apps.connector_123123.approve = true`.
## Why
Managed filesystem `deny_read` requirements are administrator-enforced
restrictions on specific paths. Once those requirements are active,
Codex should not drop them just because an execution path would
otherwise leave the sandbox.
Before this change, an explicit escalation, a prefix-rule allow, a
sandbox-denial retry, or an app-server legacy sandbox override could
rebuild the runtime policy without those managed read-deny entries and
expose a path the administrator had marked unreadable.
This is narrower than general sandbox-mode constraints. If an enterprise
only sets `allowed_sandbox_modes`, a trusted `prefix_rule(..., decision
= "allow")` can still run its matching command unsandboxed; this PR only
preserves managed filesystem `deny_read` restrictions across those
paths.
## What Changed
- Mark filesystem policies built from managed `deny_read` requirements
so callers can tell when those deny entries must survive escalation.
- Preserve managed deny-read entries when runtime permission profiles
are rebuilt through protocol, app-server, or legacy sandbox-policy
compatibility paths.
- Keep managed deny-read attempts inside the selected sandbox on the
first attempt and after sandbox-denial retries.
- Preserve the same behavior in the zsh-fork escalation path, including
prefix-rule-driven escalation.
- Add a regression test showing the opposite case too: without managed
deny-read, a prefix-rule allow still chooses unsandboxed execution.
## Verification
Targeted automated verification:
```shell
cargo test -p codex-core shell_request_escalation_execution_is_explicit -- --nocapture
cargo test -p codex-core prefix_rule_uses_unsandboxed_execution_without_managed_deny_read -- --nocapture
cargo test -p codex-core prefix_rule_preserves_managed_deny_read_escalation -- --nocapture
cargo test -p codex-protocol permission_profile_round_trip_preserves_filesystem_policy_metadata -- --nocapture
cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable -- --nocapture
cargo test -p codex-app-server-protocol permission_profile_file_system_permissions_preserves_policy_metadata -- --nocapture
cargo check -p codex-app-server -p codex-tui
```
Smoke-test invocations:
```shell
# macOS exact deny + allowed control
codex exec --skip-git-repo-check -C "$ROOT" \
-c 'default_permissions="deny_read_smoke"' \
-c 'permissions.deny_read_smoke.filesystem={":minimal"="read",":project_roots"={"."="write","secrets"="none","future-secret"="none","**/*.env"="none"}}' \
'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.'
# Linux exact deny + allowed control
codex exec --skip-git-repo-check -C "$ROOT" \
-c 'default_permissions="deny_read_smoke"' \
-c 'permissions.deny_read_smoke.filesystem={":minimal"="read",glob_scan_max_depth=3,":project_roots"={"."="write","secrets"="none","future-secret"="none","**/*.env"="none"}}' \
'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.'
```
Observed manual smoke matrix:
| Case | macOS Seatbelt | Linux bubblewrap |
| --- | --- | --- |
| `cat allowed.txt` | Pass | Pass |
| `cat secrets/exact-secret.txt` | Blocked | Blocked |
| `cat envs/root.env` | Blocked | Blocked |
| `cat envs/nested/one.env` | Blocked | Blocked |
| `cat envs/nested/two.env` | Blocked | Blocked |
| `cat alias-to-secrets/exact-secret.txt` | Blocked | Blocked |
| Missing denied path | A file created after sandbox setup remained
unreadable | Creation was blocked by the reserved missing-path
placeholder, and the placeholder was cleaned up after exit |
| Real `codex exec` shell turn | Pass | Pass |
Notes:
- The Linux smoke run used the fallback glob walker because the devbox
did not have `rg` installed.
- The smoke matrix verifies the end-to-end filesystem behavior on macOS
and Linux; the escalation-specific behavior is covered by the focused
tests above.
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charlie Marsh <charliemarsh@openai.com>
## Summary
Remote clients can still receive large `thread/resume` histories when
prior turns include MCP tool call payloads or image-generation results.
This adds a temporary response-only redaction path for the known remote
client names.
Longer term we will move towards fully paginated APIs backed by SQLite.
## Changes
- Redact MCP tool call payload-bearing fields in `thread/resume`
responses for `codex_chatgpt_android_remote` and
`codex_chatgpt_ios_remote`.
- Drop `imageGeneration` items from those `thread/resume` responses.
- Keep redaction out of persisted rollout files, `thread/read`,
`thread/turns/list`, live notifications, and token usage replay.
- Cover the behavior with app-server helper tests and a v2 resume
integration test that checks both remote clients plus a non-target
control client.
## Testing
- `cargo test -p codex-app-server thread_resume_redaction`
- `cargo test -p codex-app-server
thread_resume_redacts_payloads_for_chatgpt_remote_clients`
## Summary
This is the `exec-server` follow-up to #21759.
#21759 fixed the Windows `taskkill` output leak for the `rmcp-client`
MCP teardown path, but #22050 showed that `exec-server` still had a
parallel `taskkill /T /F` cleanup path in
`exec-server/src/connection.rs`. Because that command inherited the
parent stdio handles, Windows could still print `SUCCESS:` lines into
the user's terminal during stdio child cleanup.
This change silences that remaining `exec-server` callsite by
redirecting `taskkill` stdin, stdout, and stderr to `Stdio::null()`.
## What Changed
- add a Windows-only `Stdio` import in `exec-server/src/connection.rs`
- redirect the `taskkill` command in `kill_windows_process_tree` to
`Stdio::null()` for stdin, stdout, and stderr
- keep the existing kill semantics unchanged by still checking
`.status()` and preserving the existing fallback/logging behavior
## How to Test
Manual validation is Windows-only, so I did not run the UI repro path
locally here.
1. On Windows, use a Codex build from this branch.
2. Exercise an `exec-server` stdio flow that spawns a child process tree
and then triggers transport cleanup.
3. Confirm the child process tree is still torn down.
4. Confirm the terminal no longer shows `SUCCESS: The process with PID
... has been terminated.` lines during cleanup.
Targeted tests:
- `cargo test -p codex-exec-server
client::tests::dropping_stdio_client_terminates_spawned_process --
--exact`
- `cargo test -p codex-exec-server
client::tests::malformed_stdio_message_terminates_spawned_process --
--exact`
Notes:
- `cargo test -p codex-exec-server` still hits unrelated local macOS
`sandbox-exec: sandbox_apply: Operation not permitted` failures in
`tests/file_system.rs`.
## References
- Fixes the remaining callsite discussed in #22050
- Related earlier fix: #21759
## Summary
Restricts behavior of `is_known_safe_command` only to modes where it is
explicitly part of the documented behavior:
- when `environment_lacks_sandbox_protections`
- in `AskForApproval::UnlessTrusted`
Notably, as a result of this, escalations for commands that pass
`is_known_safe_commands` are no longer auto-approved in
AskForApproval::OnRequest or AskForApproval::Granular.
## Testing
- [x] Updated unit tests
- [x] Updated approvals scenario tests.
---------
Co-authored-by: Codex <noreply@openai.com>
This PR replaces the TUI’s file-only `@mention` popup with a unified
mentions experience. Typing `@...` now searches across filesystem
matches, installed plugins, and skills in one popup, with result types
clearly labeled and selectable from the same flow.
- Adds a unified `@mentions` popup that returns:
- plugins
- skills
- files
- directories
- Adds search modes so users can narrow the popup without changing their
query:
- All Results _(default/same as Codex App)_
- Filesystem Only
- Plugins _(...and skills)_
- Preserves existing insertion behavior:
- selected file paths are inserted into the prompt
- paths with spaces are quoted
- image file selections still attach as images when possible
- selecting a plugin or skill inserts the corresponding `$name`
- the composer records the canonical mention binding, such as
`plugin://...` or the skill path
- Expanded `@mentions` rendering:
- type tags for Plugin, Skill, File, and Dir
- distinct plugin/filesystem colors
- stable fixed-height layout (8 rows)
- truncation behavior for narrow terminals
Note:
- The unified mentions popup does not display app connectors under
`@mention` results for Codex App parity. Connector mentions remain
available through the existing `$mention` path.
https://github.com/user-attachments/assets/f93781ed-57d3-4cb5-9972-675bc5f3ef3f
## Summary
- add SQLite init, backfill-gate, and fallback telemetry without
introducing a cross-cutting state-db access wrapper
- install one process-scoped telemetry sink after OTEL startup and let
low-level state/rollout paths emit through it directly
- add process-start metrics for the process owners that initialize
SQLite
---------
Co-authored-by: Owen Lin <owen@openai.com>
## Summary
- accumulate completed tool-item counts per turn from the item lifecycle
- populate the reserved count fields on `codex_turn_event`
- add reducer coverage for zero-count turns and mixed completed tool
items
## Why
PR #17090 moved tool-item analytics onto the item lifecycle, so the turn
reducer can now derive the per-turn tool counts from the same completed
items instead of leaving the reserved fields null.
## Validation
- `just fmt`
- `cargo test -p codex-analytics`
## Why
Long-running turns can accumulate enough denied auto-review decisions to
trip the global short-circuit even when those denials are spread far
apart. The breaker should still stop genuinely bad loops, but it should
judge recent behavior instead of lifetime turn history.
## What changed
- Replaced the lifetime `10 total denials` threshold with `10 denials in
the last 50 reviews`.
- Kept the existing `3 consecutive denials` interrupt behavior
unchanged.
- Tracked recent auto-review outcomes in the circuit breaker and updated
the warning copy to report the rolling-window count.
- Renamed the new rolling-window coverage to `auto_review_*` test names.
- Added coverage that confirms older denials fall out of the 50-review
window and no longer trigger the breaker.
## Validation
- `just fmt`
- `cargo test -p codex-core guardian_rejection_circuit_breaker --lib`
- `cargo test -p codex-core auto_review_rejection_circuit_breaker --lib`
## Why
Users have requested the ability to edit a goal's objective after a goal
has been created. This PR exposes a new `/goal edit` command in the TUI
to address this request.
In the process of implementing this, I also noticed an existing bug in
the goal runtime. When a goal's objective is updated through the
`thread/goal/set` app server API, the goal runtime didn't emit a new
steering prompt to tell the agent about the new objective. This PR also
fixes this hole.
## What Changed
- Adds `/goal edit` in the TUI, opening an edit box prefilled with the
current goal objective.
- Keeps active and paused goals in their current state, resets completed
goals to active, keeps budget-limited goals budget-limited, and
preserves the existing token budget.
- Changes the existing `thread/goal/set` behavior so editing an
objective preserves goal accounting instead of resetting it. The older
reset-on-new-objective behavior was left over from before
`thread/goal/clear`; clients that need to reset accounting can now clear
the existing goal and create a new one.
- Reuses the existing goal set API path; this does not add or change
app-server protocol surface area.
- Adds a dedicated goal runtime steering prompt when an externally
persisted goal mutation changes the objective, so active turns receive
the updated objective.
## Validation
- Make sure `/goal edit` returns an error if no goal currently exists
- Make sure `/goal edit` displays an edit box that can be optionally
canceled with no side effects
- Make sure that an edited goal results in a steer so the agent starts
pursuing the new objective
- Make sure the new objective is reflected in the goal if you use
`/goal` to display the goal summary
- Make sure that `/goal edit` doesn't reset the token budget, time/token
accounting on the updated goal
## Why
The app-server no longer needs to expose a TCP websocket listener.
Keeping that transport also kept around a separate listener/auth surface
that is unnecessary now that local clients can use stdio or the
Unix-domain control socket, while remote connectivity is handled by
`remote_control`.
## What Changed
- Removed `ws://IP:PORT` parsing and the `AppServerTransport::WebSocket`
startup path.
- Deleted the app-server websocket listener auth module and removed
related CLI flags/dependencies.
- Kept websocket framing only where it is still needed: over the
Unix-domain control socket and in the outbound `remote_control`
connection.
- Updated app-server CLI/help text and `app-server/README.md` to
document only `stdio://`, `unix://`, `unix://PATH`, and `off` for local
transports.
- Converted affected app-server integration coverage from TCP websocket
listeners to UDS-backed websocket connections, and added a parse test
that rejects `ws://` listen URLs.
- Removed the now-unused workspace `constant_time_eq` dependency and
refreshed `Cargo.lock` after `cargo shear` caught the drift.
- Moved test app-server UDS socket paths to short Unix temp paths so
macOS Bazel test sandboxes do not exceed Unix socket path limits.
## Verification
- Added/updated tests around UDS websocket transport behavior and
`ws://` listen URL rejection.
- `cargo shear`
- `cargo metadata --no-deps --format-version 1`
- `cargo test -p codex-app-server unix_socket_transport`
- `cargo test -p codex-app-server unix_socket_disconnect`
- `just fix -p codex-app-server`
- `git diff --check`
Local full Rust test execution was blocked before compilation by an
external fetch failure for the pinned `nornagon/crossterm` git
dependency. `just bazel-lock-update` and `just bazel-lock-check` were
retried after the manifest cleanup but remain blocked by external
BuildBuddy/V8 fetch timeouts.
Fixes#20792
## Why
`/goal`-first threads are valid resumable threads, but they can be
missing from `codex resume` and app recents because discovery depends on
metadata derived from a normal first user message.
PR #21489 attempted to fix this by using the goal objective as
`first_user_message`. Review feedback pointed out that
`first_user_message` does more than provide visible text today: it gates
listing, supplies preview text, and participates in deciding whether a
later title should surface as a distinct thread name. Reusing it for the
goal objective could leave a `/goal`-first thread with
`first_user_message=<goal>` and `title=<later prompt>`, even though the
goal should only provide the initial visible preview.
This PR follows that feedback by and keeps the `first_user_message` as
is but introduces a new `preview` field to separate concerns. The
`preview` field is populated from the first user message or the goal
objective. We can extend it in the future to include other sources.
## What Changed
- Added internal thread `preview` metadata in `codex-state`, including a
SQLite migration that backfills from `first_user_message` and from
existing `thread_goals` objectives when needed.
- Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first
threads can be listed and searched without mutating
`first_user_message`.
- Updated rollout listing, state queries, thread-store conversion, and
app-server mapping to use preview metadata while continuing to expose
the existing public `preview` field.
- Preserved title/name distinctness behavior around literal
`first_user_message`, so a later normal prompt after `/goal` does not
surface as a separate name just because the goal supplied the initial
preview.
- Preserved compatibility for older/internal metadata writes by deriving
preview from `first_user_message` when explicit preview metadata is
absent.
## Verification
- Manually verified that a thread that starts with a `/goal <objective>`
shows up in the resume picker.
## Summary
This PR updates the goal continuation prompt to address feedback from
early adopters. There are two primary changes:
1. Goal continuation and budget-limit steering prompts now use hidden
user-context messages instead of hidden developer messages.
2. The goal continuation prompt is refined to improve the model's
ability to fully complete the active goal rather than stop at a smaller
or merely passing subset.
The user-message transition is important for two reasons. First, it
eliminates an issue where older steering messages could be responded to
again after a new turn. Second, it works better with compaction because
user messages are treated differently from developer messages during
compaction.
The prompt refinements make persistence explicit, ground work in current
evidence, encourage `update_plan` for multi-step progress visibility,
and require stronger completion audits before calling `update_goal`. It
also removes the elapsed-time reporting in the prompt; I saw evidence
that this was causing the model to shortcut work as it became nervous
about time.
These changes were tested with evals. Chriss4123 has also been running
independent evals in
[#19910](https://github.com/openai/codex/issues/19910), and many of the
improvements in this PR were suggested by him.
## Verification
- Tested with evals.
- Added and updated focused `codex-core` coverage for hidden goal user
context, continuation and budget-limit request shape, prompt rendering,
and objective delimiter escaping.
Addresses #22101
## Why
Side conversations are ephemeral forks of the active thread, but `/side`
was building its fork config from the app-level config after refreshing
it from disk. If the parent thread had runtime settings that differed
from the current persisted defaults, such as a changed model, reasoning
effort, permissions, reviewer, or fast-mode selection, the side
conversation could start with different behavior than its parent.
## What changed
- Build side fork config from the active parent `ChatWidget` config,
then overlay the parent thread's effective model, reasoning effort,
service tier, and fast-mode opt-out state.
- Forward model reasoning summary, verbosity, personality, web search
mode, and service-tier overrides through TUI app-server
start/resume/fork lifecycle params.
- Add focused tests for parent runtime inheritance, side developer
guardrail preservation, and lifecycle param forwarding.
## Why
Dogfooder feedback exposed two correctness gaps in normal-loop overflow
recovery:
1. a sampling request that hit `ContextWindowExceeded` could keep
re-entering auto-compaction indefinitely if the compacted retry still
did not fit, and
2. local compact-history rebuilds flattened user messages down to text,
so an overflowing `[image, "what is this?"]` turn could be retried
without the image after compaction.
That means recovery could either fail to terminate cleanly or proceed
with a materially weakened version of the user request.
## What changed
- Move normal-loop `ContextWindowExceeded` handling into the sampling
retry loop, so successful rescue compaction consumes the provider retry
budget instead of creating an unbounded outer-turn loop.
- Keep compacted user-history rebuilds structured:
`collect_user_messages` now carries user `UserInput` content rather than
flattened strings, and `build_compacted_history` reconstructs full user
messages from that structured representation.
- Preserve image inputs while retaining the existing text-budget
truncation behavior for compacted user history.
- Preserve existing compaction-task failure handling and client-session
reset behavior while bounding repeated overflow retries.
- Add focused regression coverage for:
- recovery after a normal-loop overflow,
- retry-budget exhaustion after repeated overflow,
- local recovery preserving image + text input,
- remote recovery preserving image + text input,
- remote compaction v2 preserving image + text input, and
- compaction failure still terminating cleanly.
The main behavior changes are in `codex-rs/core/src/session/turn.rs` and
`codex-rs/core/src/compact.rs`.
## Verification
- Not run locally; relying on PR CI for this update.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
A user reported that `/goal` was not saved to the TUI command history,
which made it unavailable for later recall even though other accepted
input paths persist history entries.
This updates the TUI goal slash-command dispatch so successful `/goal`
invocations append the command text to message history. The change
covers the bare `/goal` menu command, goal control commands such as
`/goal pause`, and objective-setting commands such as `/goal improve
benchmark coverage`.
## Verification
- `cargo test -p codex-tui goal_slash_command -- --nocapture`
## Why
This is the next narrow step toward moving concrete tool families out of
core. After #22138 introduced `codex-tool-api`, we still needed a real
end-to-end seam that lets an extension own an executable tool definition
once and have core install it without the temporary `extension-api`
wrapper or a dependency on `codex-tools`.
`codex-tool-api` is the small extension-facing execution contract, while
`codex-tools` still has a different job: host-side shared tool metadata
and planning logic that is not “run this contributed tool”, like spec
shaping, namespaces, discovery, code-mode augmentation, and
MCP/dynamic-to-Responses API conversion
## What changed
- Moved the shared leaf tool-spec and JSON Schema types into
`codex-tool-api`, so the executable contract now lives with
[`ToolBundle`](c538758095/codex-rs/tool-api/src/bundle.rs (L19-L70)).
- Replaced the temporary extension-side tool wrapper with direct
`ToolBundle` use in `codex-extension-api`.
- Taught core to collect contributed bundles, include them in spec
planning, register them through
[`ToolRegistryBuilder::register_tool_bundle`](c538758095/codex-rs/core/src/tools/registry.rs (L653-L667)),
and dispatch them through the existing router/runtime path.
- Added focused coverage for contributed tools becoming model-visible
and dispatchable, plus spec-planning coverage for contributed function
and freeform tools.
## Verification
- Added `extension_tool_bundles_are_model_visible_and_dispatchable` in
`core/src/tools/router_tests.rs`.
- Added spec-plan coverage in `core/src/tools/spec_plan_tests.rs` for
contributed extension bundles.
## Related
- Follow-up to #22138
## Summary
- make the shared `ToolExecutor::is_mutating` default conservative by
returning `true`
- update the trait docs to say read-only tools should opt out explicitly
- add a regression test covering the default behavior
## Why
Hosts use this signal for serialization and approval policy. Treating
unknown contributed tools as read-only lets a write-capable tool
accidentally bypass mutating-tool safeguards if it forgets to override
the hook.
## Validation
- not run, per request
## Why
The tool-extraction work needs one shared executable-tool seam that
hosts and tool owners can depend on without reaching into `codex-core`.
Landing that seam first makes the later tool-family ports incremental
and keeps the reusable contract separate from any one migration.
## What changed
- add a new `codex-tool-api` crate and workspace wiring
- move the common executable-tool contracts into that crate:
`ToolBundle`, `ToolDefinition`, `ToolExecutor`, `ToolCall`, `ToolInput`,
`ToolOutput`, `JsonToolOutput`, and `ToolError`
- keep host state generic through `ToolBundle<C>` / `ToolCall<C>` so
later integrations can provide their own runtime context without baking
core types into the API
- carry the host signals the runtime will need later, including
parallel-call support and mutability probing
- leave existing tool families in place for now; this PR only
establishes the reusable API surface
- add the Bazel target and lockfile updates for the new crate
## Testing
- `cargo test -p codex-tool-api`
## Why
Git commit attribution is prompt policy, not session orchestration.
After #21737 adds the extension-registry seam, this moves that
prompt-only behavior out of `codex-core` so `Session` can consume
extension-contributed prompt fragments instead of owning a one-off
policy path itself.
Before this PR, `Session` injected the trailer instruction directly from
`codex-core` ([session
assembly](a57a747eb6/codex-rs/core/src/session/mod.rs (L2733-L2739)),
[helper
module](a57a747eb6/codex-rs/core/src/commit_attribution.rs (L1-L33))).
This branch moves that same responsibility into
[`codex-git-attribution`](b5029a6736/codex-rs/ext/git-attribution/src/lib.rs (L14-L100)).
## What changed
- Added the `codex-git-attribution` extension crate.
- Snapshot `CodexGitCommit` plus `commit_attribution` at thread start,
then contribute the developer-policy fragment through the extension
registry.
- Register the extension in app-server thread extensions.
- Remove the old `codex-core` helper module and direct `Session`
injection path.
This keeps the existing behavior intact: the prompt is only contributed
when `CodexGitCommit` is enabled, blank attribution still disables the
trailer, and the default remains `Codex <noreply@openai.com>`.
## Stack
- Stacked on #21737.
## Why
[#21736](https://github.com/openai/codex/pull/21736) introduces the
typed extension API, but the runtime does not yet carry a registry
through thread/session startup or give contributors host-owned stores to
read from. This PR wires that host-side path so later feature migrations
can move product-specific behavior behind typed contributions without
adding another bespoke seam directly to `codex-core`.
## What changed
- Thread `ExtensionRegistry<Config>` through `ThreadManager`,
`CodexSpawnArgs`, `Session`, and sub-agent spawn paths.
- Wire `ThreadStartContributor` and `ContextContributor`
- Expose the small supporting surface needed by non-core callers that
construct threads directly, including `empty_extension_registry()`
through `codex-core-api`.
This PR lands the host plumbing only: the app-server registry is still
empty, and concrete feature migrations are intended to follow
separately.
## Why
`codex-core` still owns a growing amount of product-specific behavior.
This PR starts the extraction path by introducing a small, typed
first-party extension seam: features can install the contribution
families they actually own, while the host keeps lifecycle and state
ownership instead of pushing a broad service locator into the API.
See the `examples/` for illustration
## Known limitations
* Tool contract definition will be shared with core
* Fragments must be extracted
* Missing some contributors
## Summary
- Populate `plugin/list` interface metadata for installed Git-sourced
marketplace plugins from the active cached plugin bundle.
- Preserve marketplace category precedence so list behavior matches
`plugin/read`.
- Keep existing fallback behavior when the cache or manifest is missing
or invalid.
## Test Plan
- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-core-plugins
list_marketplaces_installed_git_source_reads_metadata_from_cache_without_cloning`
- `cd codex-rs && cargo test -p codex-app-server
plugin_list_returns_installed_git_source_interface_from_cache`
- `cd codex-rs && just fix -p codex-core-plugins`
- `cd codex-rs && just fix -p codex-app-server`
- `git diff --check`
Server-truth check: OpenAI monorepo app-server generated types already
expose `PluginSummary.interface`, and the webview consumes it for plugin
cards. This PR keeps the protocol/schema unchanged and fills the
existing field from the cached installed bundle for Git-backed
cross-repo plugins.
## Why
The TUI currently treats Markdown tables as ordinary wrapped text, which
makes table-heavy responses hard to read and brittle across narrow panes
and terminal resizes.
This change teaches the TUI to render Markdown tables responsively while
preserving the raw Markdown source needed to re-render streamed and
finalized transcript content after width changes. The goal is to keep
tables legible during streaming, after resize, and once a turn has
finished, without corrupting scrollback ordering.
## What Changed
- add table detection and responsive table rendering in the Markdown
renderer
- render standard tables with Unicode box-drawing borders when the pane
is wide enough
- add a vertical readability fallback for constrained or dense tables so
narrow panes still show each row clearly
- keep links and `<br>` content inside table cells instead of leaking
text outside the table
- avoid table normalization inside fenced or indented code blocks
- preserve raw streamed Markdown source and keep the active table as a
mutable tail until finalization
- consolidate finalized streamed content into source-backed transcript
cells so post-resize re-rendering stays correct
- add snapshot and targeted streaming/resize regression coverage for the
new table behavior
## How to Test
1. Start Codex TUI from this branch.
2. Paste this exact prompt:
`This is a session to test codex, no need to do any thinking, just end
different markdown tables, with columns exploring different markdown
contents, like links, bold italic, code, etc. Make them different sizes,
some 30+ rows, some not and intertwine them with some paragraphs with
complex formatting as well.`
3. Confirm the response includes several Markdown tables mixed with
richly formatted paragraphs.
4. Confirm wide-enough tables render with box-drawing borders instead of
plain wrapped pipe text.
5. Resize the terminal narrower while the answer is still streaming and
confirm the in-progress table stays coherent instead of duplicating
headers or leaving broken scrollback behind.
6. Resize again after the turn finishes and confirm the finalized
transcript re-renders cleanly at the new width.
7. In a narrow pane, verify dense tables fall back to the vertical
per-row layout instead of producing unreadable wrapped columns.
8. Also verify pipe-heavy fenced code blocks still render as code, not
as tables.
Targeted tests:
- `cargo test -p codex-tui table_readability_fallback --no-fail-fast`
- `cargo test -p codex-tui markdown_render --no-fail-fast`
- `cargo test -p codex-tui streaming::controller --no-fail-fast`
- `cargo test -p codex-tui table_resize_lifecycle --no-fail-fast`
## Docs
No developer docs update appears necessary.
## Summary
The issue digest uses recent posts, comments, and reactions to decide
which issues deserve attention. A single active user could previously
raise an issue's apparent importance by commenting or reacting multiple
times in the window.
This changes `codex-issue-digest` so `user_interactions` counts unique
human GitHub users per issue across new issue posts, new comments, and
new reactions. Raw reaction/comment counts are still preserved for
detail output, and the skill guidance now describes `Interactions` as a
unique-human-user count.
## Why
On native Windows, running `/mcp` can leak `taskkill`'s normal
`SUCCESS:` messages into the Codex TUI while the temporary MCP inventory
process tree is being torn down. That corrupts the screen even though
MCP itself is working correctly.
Fixes#20845.
## What Changed
- Redirect the Windows-only MCP teardown `taskkill` subprocess to null
stdio so its console output cannot reach the TUI.
## How to Test
1. On native Windows, configure a stdio MCP server, for example:
```powershell
codex mcp add sequential-thinking -- npx -y
@modelcontextprotocol/server-sequential-thinking
```
2. With the latest released Codex CLI, start Codex and run `/mcp`.
3. Confirm the current behavior: `taskkill` `SUCCESS:` lines appear in
the TUI during the MCP refresh.
4. Switch to this branch's build, start Codex again, and run `/mcp`.
5. Confirm the MCP inventory still renders normally and the `taskkill`
lines no longer appear.
6. Repeat `/mcp` once more on this branch to verify the regression does
not recur on repeated inventory requests.
Targeted tests:
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-rmcp-client --test process_group_cleanup --quiet`
## Why
Inside tmux, `Shift+Enter` can still reach Codex as a plain `Enter` even
when tmux has extended keys enabled. In `csi-u` tmux panes, Codex needs
to request `modifyOtherKeys` mode 2 so tmux moves the pane from `VT10x`
into extended-key mode and preserves the Shift modifier. Without that
extra request, composer `Shift+Enter` submits the draft instead of
inserting a newline.
Fixes#21699.
## What Changed
- Detect tmux sessions and read the active `extended-keys-format`,
preferring the pane-local value before falling back to the global
option.
- Request `modifyOtherKeys` mode 2 for tmux panes using `csi-u` extended
keys, and reset it when restoring keyboard reporting.
- Add unit coverage for tmux detection, the format gate, and the emitted
`modifyOtherKeys` escape sequence.
## How to Test
1. In tmux, configure:
```tmux
set-option -g extended-keys on
set-option -g extended-keys-format csi-u
```
2. Start Codex in a fresh tmux pane from this branch.
3. From another pane, confirm the Codex pane reports `mode=Ext 2`:
```bash
tmux list-panes -a -F '#{session_name}:#{window_index}.#{pane_index}
mode=#{pane_key_mode} cmd=#{pane_current_command}'
```
4. Type a draft in the composer and press `Shift+Enter`; confirm it
inserts a newline instead of submitting.
5. Also confirm plain `Enter` still submits as before.
Targeted tests:
- `cargo test -p codex-tui`
## Notes
- Manual verification used both real `Shift+Enter` in iTerm2/tmux and
`tmux send-keys ... S-Enter` to confirm the tmux pane changes from
`VT10x` to `Ext 2` and preserves newline behavior.
- On this checkout, the broader `codex-tui` test run currently reaches
unrelated existing failures in `status::tests::*` plus a later stack
overflow in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all`.
### Motivation
- Normalize persisted service tier so selecting the request value
`priority` (or legacy `fast`) is stored as `fast` while preserving
unknown tier IDs and keeping request-time behavior unchanged.
### Description
- Update persistence logic in `codex-rs/core/src/config/edit.rs` so
`ConfigEdit::SetServiceTier` maps request values: `priority`/`fast` ->
`"fast"`, `flex` -> `"flex"`, and leaves unknown strings unchanged.
- Add unit tests in `codex-rs/core/src/config/edit_tests.rs` that verify
a `priority` selection is written to `config.toml` as `"fast"` and that
unknown tiers are preserved.
- Add a config load test in `codex-rs/core/src/config/config_tests.rs`
to ensure `service_tier = "priority"` still resolves to the `priority`
request value at load time.
- Add the required import `use
codex_protocol::config_types::ServiceTier;` to the edited modules.
### Testing
- Ran `just fmt` and `just fix -p codex-core` to apply formatting and
lints and they completed successfully.
- Ran `cargo test -p codex-core --lib service_tier` (focused unit tests
for the change) and the tests passed.
- Ran `cargo test -p codex-protocol` and the protocol test suite passed.
- Note: an initial broader `cargo test -p codex-core service_tier`
invocation matched integration tests and produced unrelated
failures/hangs, so that run was interrupted and the focused `--lib`
unit-test invocation was used instead.
------
[Codex
Task](https://chatgpt.com/codex/cloud/tasks/task_i_69ffc5a1262c8321af91b69c9845147f)
## Summary
`ChatWidget` has been carrying several independent domains in one large
state bag: transcript bookkeeping, turn lifecycle, queued input, status
surfaces, connectors, review mode, and protocol dispatch. That makes
otherwise-local changes hard to reason about because unrelated fields
and side effects live beside each other in `chatwidget.rs`.
This is the first cleanup PR in a larger decomposition effort. It does
not try to make `chatwidget.rs` small in one sweep; instead, it
establishes focused state boundaries that later handler, popup,
rendering, and effect-synchronization extractions can build on.
This PR keeps `ChatWidget` as the composition layer while moving focused
state into smaller `codex-tui` modules. The widget still owns effects
that touch the bottom pane, app events, command submission, redraw
scheduling, and terminal-title updates.
## Changes
- Add focused state modules under `codex-rs/tui/src/chatwidget/` for
input queues, turn lifecycle, transcript bookkeeping, status state,
connectors, review mode, and app-server protocol dispatch.
- Update `ChatWidget` to hold grouped state structs and route
input/lifecycle/status operations through those focused helpers.
- Move app-server notification dispatch into `chatwidget/protocol.rs`
while leaving feature handlers and side effects on `ChatWidget`.
- Replace the large manual `ChatWidget` test literal with the normal
constructor plus narrow test overrides, so future state moves do not
require every field to be restated in test setup.
- Update existing tests to access the new grouped state or narrower
helpers without changing snapshot behavior.
## Longer-term direction
Follow-up PRs can continue shrinking `chatwidget.rs` by moving behavior,
not just state, into focused modules:
- Extract input/submission flow, turn/stream handling, and tool-cell
lifecycles into domain modules that call the new state reducers.
- Move popup/settings builders and rendering helpers out of the main
widget file so `ChatWidget` stays focused on composition.
- Reduce direct `BottomPane` mutation by applying domain-specific sync
outputs at clearer boundaries.
## Why
Fixes#16688.
The TUI currently hydrates collab receiver metadata by awaiting
`thread/read` before each active-thread notification is rendered. During
large subagent fan-outs, the embedded app-server can be busy starting
agents and processing spawn work, so those synchronous metadata reads
queue behind the fan-out and block the TUI event loop. That makes the UI
appear frozen even though the underlying agent work can continue.
## What Changed
- Replaced eager `thread/read` metadata hydration on the active
notification path with local receiver-thread caching.
- Kept `ThreadStarted` and picker refreshes as the places that fill in
agent nickname/role metadata when it is available.
- Skipped caching receiver threads that are explicitly reported as
`NotFound`, avoiding live-looking ghost entries for failed stale-agent
calls.
- Added TUI tests covering both local receiver caching and `NotFound`
suppression.
## Verification
- `cargo test -p codex-tui collab_receiver_notification`
- `just fix -p codex-tui`
I also ran the full `cargo test -p codex-tui`; the new test passed, but
the full process later aborted with an unrelated stack overflow in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all`.
# Why
Hooks that need trust review were easy to miss, and the existing TUI
flow made users discover `/hooks` manually before they could decide
whether to inspect or trust them.
# What
- add a startup review prompt for new or changed hooks before normal
composer use
- add a top-level `t` shortcut in `/hooks` to trust every review-needed
hook at once
- make pending-review rows and helper copy use warning styling
## TUI
### Startup review interstitial
```text
Hooks need review
2 hooks are new or changed.
Hooks can run outside the sandbox after you trust them.
› 1. Review hooks
2. Trust all and continue
3. Continue without trusting (hooks won't run)
```
### Top-level `/hooks` page when review is needed
```text
Hooks
Lifecycle hooks from config and enabled plugins.
⚠ 1 hook needs review before it can run.
Event Installed Active Review Description
PreToolUse 1 0 1 Before a tool executes
...
Press t to trust all; enter to review hooks; esc to close
```
## Why
On light terminal backgrounds, selected rows in several TUI pickers were
rendered with the same bright cyan accent used on dark themes. Against
the light menu surface, that made the current selection hard to
distinguish at a glance.
<table><tr>
<td>
<p align="center">Before</p>
<img width="1109" height="864" alt="SCR-20260509-nmtz"
src="https://github.com/user-attachments/assets/b31ce0d0-19c2-4bdd-a220-7acc77bd8e8e"
/>
</td>
<td>
<p align="center">After</p>
<img width="1164" height="844" alt="SCR-20260509-nmox"
src="https://github.com/user-attachments/assets/7b3fede0-4739-4a9f-a979-cdbb7451841f"
/>
</td>
</tr></table>
## What changed
- Added a shared background-aware accent style for active/selected TUI
controls.
- Use a darker cyan-family accent on light backgrounds while preserving
the existing bright cyan accent on dark or unknown backgrounds.
- Reused that accent across shared picker rows and the custom
selection-like surfaces that had drifted separately: picker tabs, hooks
browsing, external-agent migration choices, and /keymap affordances.
- Added focused tests for the light/dark accent rule and rendered
selected-row styling.
## How to Test
1. Start Codex in a terminal using a light background theme.
2. Type `/` to open the slash-command picker and move the selection
through a few rows.
3. Confirm that the selected row is visibly colored with strong contrast
instead of blending into the popup surface.
4. Open `/keymap` and confirm the active tab, selected rows, and picker
hint accents use the same light-theme accent treatment.
5. In a dark terminal theme, repeat the slash-picker check and confirm
the existing bright cyan selection styling is preserved.
Targeted tests:
- `cargo test -p codex-tui accent_style_uses_`
- `cargo test -p codex-tui selected_rows_use_the_shared_accent_style`
- `cargo test -p codex-tui
selected_event_rows_use_the_shared_accent_style`
Notes:
- A full `cargo test -p codex-tui` run reached the end of the suite but
hit an unrelated existing stack overflow in
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all`.
## Why
Mixed prose lines that contained URLs started taking the URL-preserving
wrapping path, but that path could split ordinary words mid-token. A
follow-up issue remained in scrollback insertion: when already-rendered
indented rows were wrapped again, continuation rows could lose their
margin and fall back to terminal hard wrapping. Together those bugs made
normal Markdown output look broken around links, lists, blockquotes, and
indented content.
Separately, the local argument-comment lint wrappers failed under
environments that set `PYTHONSAFEPATH=1`, because Python no longer adds
the script directory to `sys.path` automatically. That prevented the
lint from reaching Rust callsites at all.
<img width="1778" height="1558" alt="CleanShot 2026-05-09 at 11 51 38"
src="https://github.com/user-attachments/assets/9274d150-1757-4f1a-89ac-5bdc9997d8cb"
/>
## What Changed
- Preserve URL tokens without turning every neighboring prose word into
a character-level split point.
- Add a mixed URL/prose wrapper that keeps ordinary words whole,
preserves leading whitespace, and re-splits long non-URL tokens against
the actual width available on continuation rows.
- Reuse a rendered history row's leading whitespace as the continuation
indent when scrollback insertion has to pre-wrap it again.
- Add regression coverage for markdown wrapping, history-cell rendering,
scrollback continuation margins, leading-indent width accounting, and
continuation-row re-splitting.
- Make both argument-comment lint entrypoints explicitly add their own
directory to `sys.path`, so sibling imports still work when
`PYTHONSAFEPATH=1`.
## How to Test
1. Start Codex and render a long Markdown response that mixes prose with
inline links, blockquotes, lists, and indented code-like text.
2. Confirm that ordinary words next to links stay whole instead of
breaking mid-word.
3. Resize or replay the transcript and confirm wrapped continuation rows
keep their expected left margin for blockquotes, lists, and indented
content.
4. Run the source argument-comment lint from a shell with
`PYTHONSAFEPATH=1` and confirm it starts normally instead of failing to
import `wrapper_common`.
Targeted tests:
- `cargo test -p codex-tui mixed_line --lib`
- `cargo test -p codex-tui preserves_prefix_on_wrapped_rows --lib`
- `cargo test -p codex-tui
agent_markdown_cell_does_not_split_words_after_inline_markdown --lib`
- `cargo test -p codex-tui
mixed_url_markdown_wraps_prose_without_splitting_words_snapshot --lib`
- `python3 tools/argument-comment-lint/test_wrapper_common.py`
- `just argument-comment-lint-from-source -p codex-tui -- --lib`
Notes:
- `cargo test -p codex-tui` currently reaches the new tests
successfully, then still aborts in the pre-existing
`tests::fork_last_filters_latest_session_by_cwd_unless_show_all`
stack-overflow failure.
## Why
[PR #1705](https://github.com/openai/codex/pull/1705) moved
`apply_patch` execution under the configured sandbox and called out the
need for integration coverage. We already covered textual `../` escapes,
but did not have coverage for link aliases that live inside a writable
workspace while pointing at, or aliasing, files visible outside it.
This PR locks in the current sandbox boundary without changing
production write semantics. Symlink escapes into a read-only outside
root should fail and leave the outside file unchanged. Existing hard
links are characterized separately: if a user-created hard link already
exists inside the writable root, sandboxed writes preserve normal
hard-link semantics rather than replacing the link and silently breaking
that relationship.
## What Changed
- Added
`apply_patch_cli_does_not_write_through_symlink_escape_outside_workspace`
to verify `apply_patch` cannot update a symlink that targets a file
outside the writable workspace.
- Added `apply_patch_cli_preserves_existing_hard_link_outside_workspace`
to verify `apply_patch` intentionally writes through an existing hard
link and does not unlink or replace it.
- Added `file_system_sandboxed_write_preserves_existing_hard_link` to
verify sandboxed `fs/writeFile` preserves an existing hard link and
writes the shared inode.
## Testing
- `cargo test -p codex-exec-server file_system_sandboxed_write`
- `cargo test -p codex-core
apply_patch_cli_does_not_write_through_symlink_escape_outside_workspace`
- `cargo test -p codex-core
apply_patch_cli_preserves_existing_hard_link_outside_workspace`
- `just fix -p codex-exec-server -p codex-core`
- `just fix -p codex-core`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21819).
* #21845
* __->__ #21819
## Why
Service-tier slash commands are built from model-catalog metadata. If
the catalog returns a name like `Fast`, the TUI currently exposes
`/Fast` and exact dispatch expects that casing, which is inconsistent
with the lowercase command style used elsewhere.
## What
- Lowercase service-tier command names when converting catalog tiers
into `ServiceTierCommand` values.
- Add regression coverage that seeds a catalog tier named `Fast` and
expects the generated command to be `fast`.
## Testing
Not run locally per repo instruction; PR CI should run the new
`service_tier_commands_lowercase_catalog_names` coverage.
## Why
The Python SDK previously protected the stdio transport with a single
active turn-consumer guard. That avoided competing reads from stdout,
but it also meant one `Codex`/`AsyncCodex` client could not stream
multiple active turns at the same time. Notifications could also arrive
before the caller received a `TurnHandle` and registered for streaming,
so the SDK needed an explicit routing layer instead of letting
individual API calls read directly from the shared transport.
## What Changed
- Added a private `MessageRouter` that owns per-request response queues,
per-turn notification queues, pending turn-notification replay, and
global notification delivery behind a single stdout reader thread.
- Generated typed notification routing metadata so turn IDs come from
known payload shapes instead of router-side attribute guessing, with
explicit fallback handling for unknown notification payloads.
- Updated sync and async turn streaming so `TurnHandle.stream()`/`run()`
and `stream_text()` consume only notifications for their own turn ID,
while `AsyncAppServerClient` no longer serializes all transport calls
behind one async lock.
- Cleared pending turn-notification buffers when unregistered turns
complete so never-consumed turn handles do not leave stale queues
behind.
- Removed the internal stream-until helper now that turn completion
waiting can register directly with routed turn notifications.
- Updated Python SDK docs and focused tests for concurrent transport
calls, interleaved turn routing, buffered early notifications, unknown
notification routing, async delegation, and routed turn completion
behavior.
## Validation
- `uv run --extra dev ruff format scripts/update_sdk_artifacts.py
src/codex_app_server/_message_router.py src/codex_app_server/client.py
src/codex_app_server/generated/notification_registry.py
tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `uv run --extra dev ruff check scripts/update_sdk_artifacts.py
src/codex_app_server/_message_router.py src/codex_app_server/client.py
src/codex_app_server/generated/notification_registry.py
tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `uv run --extra dev pytest tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The model-visible `<network>` context currently repeats indentation and
a pair of XML tags for every allowed or denied domain. Large domain sets
spend a surprising amount of prompt budget on that scaffolding instead
of the actual policy values.
## What changed
- Render allowed domains as one comma-separated `<allowed>` value
instead of one element per domain.
- Render denied domains the same way.
- Keep the full allow/deny domain sets model-visible while updating the
serialization and settings-update coverage for the denser shape.
## Example
Before:
```xml
<network enabled="true">
<allowed>api.example.test</allowed>
<allowed>cdn.example.test</allowed>
<denied>blocked.example.test</denied>
</network>
```
After:
```xml
<network enabled="true"><allowed>api.example.test,cdn.example.test</allowed><denied>blocked.example.test</denied></network>
```
## Validation
- `cargo test -p codex-core environment_context`
- `cargo test -p codex-core
build_settings_update_items_emits_environment_item_for_network_changes`
- Ran a local `codex` session with a real network context containing 121
allowed domains and 42 denied domains, then inspected the raw prompt
with `raw_token_viewer_cli.py`. With the same domain set, the rendered
`<network>` section shrank from 7,175 characters across 161 lines to
3,666 characters on one line, and the containing environment-context
block fell from 6,428 tokens to 5,379 tokens.
Expose discoverability and full share principals in share context, carry
roles through save/updateTargets, hydrate local shared plugin reads, and
keep share URLs only under plugin.shareContext.
## Why
The app-server watcher relocation leaves the generic filesystem watcher
as the last watcher-specific implementation still living inside
`codex-core`. Moving that code to a small crate keeps `codex-core`
focused on thread execution and lets app-server depend on the watcher
without reaching back into core for filesystem watching primitives.
This PR is stacked on #21287.
## What changed
- Added a new `codex-file-watcher` crate containing the existing watcher
implementation and its unit tests.
- Updated app-server `fs_watch`, `skills_watcher`, and listener state to
import watcher types from `codex-file-watcher`.
- Removed the `file_watcher` module and `notify` dependency from
`codex-core`.
- Updated Cargo workspace metadata and `Cargo.lock` for the new internal
crate.
## Validation
- `cargo check -p codex-file-watcher -p codex-core -p codex-app-server`
- `cargo test -p codex-file-watcher`
- `cargo test -p codex-app-server
skills_changed_notification_is_emitted_after_skill_change`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fix -p codex-file-watcher`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
## Why
PR #21460 reverted the earlier move of skills change watching from
`codex-core` into app-server. This reapplies that boundary change so
app-server owns client-facing `skills/changed` notifications and core no
longer carries the watcher.
## What
- Restore the app-server `SkillsWatcher` and register it from thread
listener setup.
- Remove the core-owned skills watcher and its core live-reload
integration surface.
- Restore app-server coverage for `skills/changed` notifications after a
watched skill file changes.
## Validation
- `cargo test -p codex-app-server --test all
suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change
-- --exact --nocapture`
- `cargo test -p codex-core --lib --no-run`
## Why
We'd like SQLite state to become required and load-bearing. As a first
step, let's remove the mechanism that allows us to blow away the SQLite
DB on a version bump, and instead rely on graceful migrations.
The original motivation
([PR](https://github.com/openai/codex/pull/10623)) behind this mechanism
was to care less about backwards compatibility while SQLite was being
landed, but I'd say it's quite important now to keep the data in it.
## What changed
- Make `STATE_DB_FILENAME` and `LOGS_DB_FILENAME` the full canonical
filenames: `state_5.sqlite` and `logs_2.sqlite`.
- Remove `STATE_DB_VERSION` / `LOGS_DB_VERSION` and the helper that
constructed filenames from versions.
- Stop `StateRuntime::init` from scanning for or deleting older SQLite
DB filenames at startup.
- Delete the tests that encoded legacy state/logs DB deletion behavior.
## Verification
- `cargo test -p codex-state`
## Why
Amazon Bedrock Mantle needs a stable client-agent header so requests
from the built-in Bedrock provider can be identified as coming from
Codex for safety stack.
## What changed
- Added `x-amzn-mantle-client-agent: codex` to the built-in Amazon
Bedrock provider default HTTP headers.
## Why
Desktop and mobile Codex clients need a machine-readable way to
bootstrap and manage `codex app-server` on remote machines reached over
SSH. The same flow is also useful for bringing up app-server with
`remote_control` enabled on a fresh developer machine and keeping that
managed install current without requiring a human session.
## What changed
- add the new experimental `codex-app-server-daemon` crate and wire it
into `codex app-server daemon` lifecycle commands: `start`, `restart`,
`stop`, `version`, and `bootstrap`
- add explicit `enable-remote-control` and `disable-remote-control`
commands that persist the launch setting and restart a running managed
daemon so the change takes effect immediately
- emit JSON success responses for daemon commands so remote callers can
consume them directly
- support a Unix-only pidfile-backed detached backend for lifecycle
management
- assume the standalone `install.sh` layout for daemon-managed binaries
and always launch `CODEX_HOME/packages/standalone/current/codex`
- add bootstrap support for the standalone managed install plus a
detached hourly updater loop
- harden lifecycle management around concurrent operations, pidfile
ownership, stale state cleanup, updater ownership, managed-binary
preflight, Unix-only rejection, forced shutdown after the graceful
window, and updater process-group tracking/cleanup
- document the experimental Unix-only support boundary plus the
standalone bootstrap/update flow in
`codex-rs/app-server-daemon/README.md`
## Verification
- `cargo test -p codex-app-server-daemon -p codex-cli`
- live pid validation on `cb4`: `bootstrap --remote-control`, `restart`,
`version`, `stop`
## Follow-up
- Add updater self-refresh so the long-lived `pid-update-loop` can
replace its own executable image after installing a newer managed Codex
binary.
## Why
The environment-backed exec-server transport currently hardcodes 5
second connect and initialize timeouts in `client_transport.rs`. That is
short for SSH-backed stdio environments and remote websocket
environments, and there is currently no way to raise those values from
`CODEX_HOME/environments.toml`.
This stacked follow-up raises the default environment transport timeouts
and lets each configured environment override them in
`environments.toml`.
## What Changed
- raise the default environment transport connect and initialize
timeouts from 5s to 10s
- store concrete timeout values on `ExecServerTransportParams` instead
of hardcoding them in `connect_for_transport(...)`
- add `connect_timeout_sec` and `initialize_timeout_sec` to
`[[environments]]` entries in `environments.toml`
- apply parse-time defaults so runtime transport code receives fully
resolved timeout values
- reject `connect_timeout_sec` on stdio environments because it only
applies to websocket transports
- extend parser tests to cover the new fields and defaults
## Stack
- base: https://github.com/openai/codex/pull/21794
- this PR: configurable environment transport timeouts
## Validation
- `cd
/Users/starr/code/codex-worktrees/exec-env-timeouts-config-20260508/codex-rs
&& just fmt`
- not run: tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Support registry-backed remote executors end to end so downstream
services can resolve an executor id into an exec-server URL and make
that environment available to Codex without relying on the legacy cloud
environments flow.
## What changed
- switch remote executor registration to the executor registry bootstrap
contract
- allow named remote environments to be inserted into
`EnvironmentManager` at runtime
- add the experimental app-server RPC `environment/add` so initialized
experimental clients can register those remote environments for later
`thread/start` and `turn/start` selection
## Validation
Ran focused validation locally:
- `cargo test -p codex-exec-server environment_manager_`
- `cargo test -p codex-exec-server
register_executor_posts_with_bearer_token_header`
- `cargo test -p codex-app-server-protocol`
## Why
The app-server daemon work needs two app-server behaviors to be safe
when lifecycle management is driven by a helper process:
- a readiness probe must not become the process-wide client identity
just because it connects first
- a graceful reload signal needs to keep draining active turns even if
it is delivered more than once
## What changed
- Treat `codex_app_server_daemon` initialization as a probe-only client
for process-global originator and user-agent suffix state.
- Distinguish forceable shutdown signals from graceful-only ones, and
treat Unix `SIGHUP` as graceful-only while leaving `SIGTERM` and Ctrl-C
forceable.
- Add regression coverage for daemon probe initialization and repeated
`SIGHUP` delivery while a turn is still running.
## Testing
- `cargo test -p codex-app-server`
- The new daemon-probe and repeated-`SIGHUP` coverage passed.
- The run still failed in the existing
`suite::conversation_summary::get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
and
`suite::conversation_summary::get_conversation_summary_by_thread_id_reads_rollout`
tests because their initialize handshake timed out.
- `cargo test -p codex-app-server --test all
suite::conversation_summary::`
- Reproduced the same two existing initialize-timeout failures in
isolation.
## Summary
- make EnvironmentProvider::snapshot path-free and keep providers
focused on provider-owned remote environments
- let provider snapshots request local inclusion via include_local, with
environments.toml including local and CODEX_EXEC_SERVER_URL excluding
local
- move reserved local environment construction into EnvironmentManager
using ExecServerRuntimePaths
Follow-up to https://github.com/openai/codex/pull/20667
## Testing
- just fmt
- git diff --check
- devbox: bazel build --bes_backend= --bes_results_url=
//codex-rs/exec-server:exec-server
- devbox: bazel test --bes_backend= --bes_results_url=
//codex-rs/exec-server:exec-server-unit-tests
Co-authored-by: Codex <noreply@openai.com>
## Why
PR CI should test the exact commit that was pushed to the PR branch. By
default, GitHub's `pull_request` event checks out a synthetic merge
commit from `refs/pull/<number>/merge`, so the tested tree can include
an implicit merge with the current base branch instead of matching the
pushed head SHA.
Using the PR head SHA makes each check result correspond to a concrete
commit the author submitted. This also behaves better for stacked PR
workflows, including Sapling stacks and other Git stack tooling: a
middle PR's head commit already contains the lower stack changes in its
tree, without pulling in commits above it or GitHub's temporary merge
ref.
## What Changed
- Set every `actions/checkout` in `pull_request` workflows under
`.github/workflows` to use `github.event.pull_request.head.sha` on PR
events and `github.sha` otherwise.
- Updated `blob-size-policy` to compare
`github.event.pull_request.base.sha` and
`github.event.pull_request.head.sha`, since it no longer checks out
GitHub's merge commit where `HEAD^1`/`HEAD^2` represented the PR range.
## Verification
- Parsed the edited workflow YAML files with Ruby.
- Checked that every checkout block in the `pull_request` workflows has
the PR-head `ref`.
## Summary
Startup tool construction currently depends on connector directory
metadata for `tool_suggest` discoverables. On a cold directory cache,
that can put slow connector-directory requests on the blocking path even
though the tools array only needs directory data for install
suggestions, not for the live connector MCP tools themselves.
This PR keeps the discoverables path off that cold network fetch:
- read connector directory metadata from cache only when building
discoverable tools
- persist connector directory metadata to
`~/.codex/cache/codex_app_directory/<hash>.json` and use it to hydrate
the in-memory cache on later runs before the normal refresh path updates
it
- use connector-directory-specific cache naming to distinguish this
metadata cache from the separate Codex Apps tools-spec cache
This reduces first-turn startup work without changing how live connector
MCP tools are sourced. Longer term, directory-backed install suggestions
should move to a search-based flow so they no longer need to be inlined
into the tools prompt at all.
## Testing
- `cargo test -p codex-connectors`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-core
request_plugin_install_is_available_without_search_tool_after_discovery_attempts`
- `cargo test -p codex-core
tool_suggest_uses_connector_id_fallback_when_directory_cache_is_empty`
## Summary
In https://github.com/openai/codex/pull/21584, we disabled doctests for
crates that lack any doctests. We can enforce that property via `cargo
shear --deny-warnings`: crates that lack doctests will be flagged if
doctests are enabled, and crates with doctests will be flagged if
doctests are disabled.
A few additional notes:
- By adding `--deny-warnings`, `cargo shear` also flagged a number of
modules that were not reachable at all. Some of those have been removed.
- This PR removes a usage of `windows_modules!` (since `cargo shear` and
`rustfmt` couldn't see through it) in favor of simple `#[cfg(target_os =
"windows")]` macros. As a consequence, many of these files exhibit churn
in this PR, since they weren't being formatted by `rustfmt` at all on
main.
- Again, to make the code more analyzable, this PR also removes some
usages of `#[path = "cwd_junction.rs"]` in favor of a more standard
module structure. The bin sidecar structure is still retained, but,
e.g., `windows-sandbox-rs/src/bin/command_runner.rs` was moved to
`windows-sandbox-rs/src/bin/command_runner/main.rs`, and so on.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The legacy `AfterToolUse` hook path was still wired through core tool
dispatch even though the hooks registry never populated any handlers for
it. The supported hook surface is `PostToolUse`, so the old
infrastructure was dead code on the hot path.
## What changed
- Removed the legacy `AfterToolUse` dispatch from `codex-core` tool
execution.
- Removed the unused legacy hook payload types and exports from
`codex-hooks`.
- Simplified legacy notify handling now that `HookEvent` only carries
`AfterAgent`.
## Validation
- `cargo test -p codex-hooks`
- `cargo test -p codex-core registry`
## Why
`apply_patch` is now a freeform/custom tool. Keeping the old
JSON/function-style registration and parsing path left another way for
models and tests to invoke `apply_patch`, which made the tool surface
harder to reason about.
## What changed
- Removed the `ApplyPatchToolType::Function` variant, JSON `apply_patch`
spec, and handler support for function payloads.
- Kept `apply_patch_tool_type = freeform` as the supported model
metadata path, including Bedrock catalog metadata.
- Migrated `apply_patch` tests and SSE fixtures to custom/freeform tool
calls.
## Verification
- `cargo test -p codex-tools -p codex-protocol -p codex-model-provider`
- `cargo test -p codex-core tools::handlers::apply_patch --lib`
- `cargo test -p codex-core --test all
apply_patch_tool_executes_and_emits_patch_events`
- `cargo test -p codex-core --test all
apply_patch_reports_parse_diagnostics`
- `cargo test -p codex-exec test_apply_patch_tool`
- `just fix -p codex-core`
- `just fix -p codex-tools -p codex-protocol -p codex-model-provider -p
codex-exec`
## Summary
TL;DR: teaches `codex-rs` / app-server to request a desktop-provided
attestation token and attach it as `x-oai-attestation` on the scoped
ChatGPT Codex request paths.

## Details
This PR teaches the Codex app-server runtime how to request and attach
an attestation token. It does not generate DeviceCheck tokens directly;
instead, it relies on the connected desktop app to advertise that it can
generate attestation and then asks that app for a fresh header value
when needed.
The flow is:
1. The Codex desktop app connects to app-server.
2. During `initialize`, the app can advertise that it supports
`requestAttestation`.
3. Before app-server calls selected ChatGPT Codex endpoints, it sends
the internal server request `attestation/generate` to the app.
4. app-server receives a pre-encoded header value back.
5. app-server forwards that value as `x-oai-attestation` on the scoped
outbound requests.
The code in this repo is mostly protocol and runtime plumbing: it adds
the app-server request/response shape, introduces an attestation
provider in core, wires that provider into Responses / compaction /
realtime setup paths, and covers the intended scoping with tests. The
signed macOS DeviceCheck generation remains owned by the desktop app PR.
## Related PR
- Codex desktop app implementation:
https://github.com/openai/openai/pull/878649
## Validation
<details>
<summary>Tests run</summary>
```sh
cargo test -p codex-app-server-protocol
cargo test -p codex-core attestation --lib
cargo test -p codex-app-server --lib attestation
```
Also ran:
```sh
just fix -p codex-core
just fix -p codex-app-server
just fix -p codex-app-server-protocol
just fmt
just write-app-server-schema
```
</details>
<details>
<summary>E2E DeviceCheck validation</summary>
First validated the signed desktop app boundary directly: launched a
packaged signed `Codex.app`, sent `attestation/generate`, decoded the
returned `v1.` attestation header, and validated the extracted
DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using
bundle ID `com.openai.codex`. Apple returned `status_code: 200` and
`is_ok: true`.
Then ran the fuller app + app-server flow. The packaged `Codex.app`
launched a current-branch app-server via `CODEX_CLI_PATH`, and a local
MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server
requested `attestation/generate` from the real Electron app process, and
the intercepted `/backend-api/codex/responses` traffic included
`x-oai-attestation` on both routes:
```text
GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present
POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present
```
The captured header decoded to a DeviceCheck token that also validated
with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`,
team `2DC432GLL2`).
</details>
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`ToolName::display()` made it too easy to flatten tool identity and
accidentally compare rendered strings. Tool identity should stay
structural until a legacy string boundary actually requires the
flattened spelling.
## What
- Removes `ToolName::display()` and relies on the existing `Display`
impl for messages and errors.
- Adds structural ordering for `ToolName` and uses it for
sorting/deduping deferred tools.
- Carries `ToolName` through tool/sandbox plumbing, flattening only at
legacy boundaries such as hook payloads, telemetry tags, and Responses
tool names.
- Updates MCP normalization tests to assert `ToolName` structure instead
of rendered strings.
## Testing
- `cargo test -p codex-mcp test_normalize_tools`
- `cargo test -p codex-core unavailable_tool`
- `just fix -p codex-protocol`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
## Why
Codex assisted-code attribution needs a client-side accepted-code source
that does not upload raw code. This adds a hash-only analytics event
derived from the turn diff so downstream attribution can compare
accepted Codex lines against commit or PR diffs.
## What Changed
- Parse accepted/effective added lines from the final turn diff and emit
`codex_accepted_line_fingerprints` analytics.
- Hash repo, path, and normalized line content before upload; raw code
and raw diffs are not included in the event.
- Chunk large fingerprint payloads and send accepted-line fingerprint
events in isolated requests while preserving normal batching for other
analytics events.
- Canonicalize Git remote URLs before repo hashing so SSH/HTTPS GitHub
remotes join to the same repo hash.
- Add parser coverage for unified diff hunk lines that look like `+++`
or `---` file headers.
## Verification
- `cargo test -p codex-analytics`
- `cargo test -p codex-git-utils canonicalize_git_remote_url`
- `just fix -p codex-analytics`
- `just bazel-lock-check`
- `git diff --check`
## Why
Published Python SDK builds depend on an exact `openai-codex-cli-bin`
runtime package, but the release workflow did not publish that runtime
package to PyPI. That left the SDK packaging story incomplete: release
artifacts could produce Codex binaries, but Python users still needed a
matching wheel carrying the platform-specific runtime and helper
executables.
This PR is stacked on #21787 so release jobs can include helper binaries
in runtime wheels: Linux wheels include `bwrap` for sandbox fallback,
and Windows wheels include the signed sandbox/elevation helpers beside
`codex.exe`.
## What changed
- Builds platform-specific `openai-codex-cli-bin` wheels from signed
release binaries on macOS, Linux, and Windows release runners.
- Packages Linux `bwrap` into musllinux runtime wheels.
- Packages Windows sandbox helper executables into Windows runtime
wheels.
- Uploads runtime wheels as GitHub release assets and publishes them to
PyPI using trusted publishing from the `pypi` GitHub environment.
- Keeps the new Python runtime publish job non-blocking so failures need
follow-up but do not fail the Rust release workflow.
- Pins the PyPA publish action to the `v1.13.0` commit SHA for
reproducible release publishing.
- Documents that runtime wheels are platform wheels published through
PyPI trusted publishing.
## Testing
- `ruby -e 'require "yaml"; ARGV.each { |f| YAML.load_file(f); puts "ok
#{f}" }' .github/workflows/rust-release.yml
.github/workflows/rust-release-windows.yml`
- `git diff --check`
CI is the real end-to-end verification for the release workflow path.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Some Codex runtime distributions need helper executables beside the main
bundled binary. Linux sandbox fallback needs a packaged `bwrap` when no
suitable system `bwrap` is available, and Windows sandbox/elevation
needs helper executables discoverable beside `codex.exe`. The checked-in
`openai-codex-cli-bin` template already packages everything under
`codex_cli_bin/bin/**`, but the staging script only copied the main
Codex binary into that directory.
This PR adds the generic staging primitive needed by release workflows
to build complete platform runtime wheels without baking
platform-specific helper names into the package template.
## What changed
- Added repeatable `stage-runtime --resource-binary` support so release
workflows can copy extra executables beside the bundled Codex binary.
- Kept resource selection in workflow code, where the platform target is
known.
- Added tests that verify resource binaries are copied into the staged
runtime package, that the wheel include config covers them, and that the
CLI forwards repeated `--resource-binary` values.
## Testing
- `uv run ruff check scripts/update_sdk_artifacts.py
tests/test_artifact_workflow_and_binaries.py`
- `uv run --extra dev pytest
tests/test_artifact_workflow_and_binaries.py::test_stage_runtime_release_copies_resource_binaries
tests/test_artifact_workflow_and_binaries.py::test_runtime_resource_binaries_are_included_by_wheel_config
tests/test_artifact_workflow_and_binaries.py::test_stage_runtime_stages_binary_without_type_generation`
Full `tests/test_artifact_workflow_and_binaries.py` still has unrelated
schema-normalization drift in the local checkout.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The earlier PRs add stdio transport support and the config-backed
environment provider, but the feature remains inert until normal Codex
entrypoints construct `EnvironmentManager` with enough context to
discover `CODEX_HOME/environments.toml`. This final stack PR activates
the provider while preserving the legacy `CODEX_EXEC_SERVER_URL`
fallback when no environments file exists.
**Stack position:** this is PR 5 of 5. It is the product wiring PR that
activates the configured environment provider added in PR 4.
## What Changed
- Thread `codex_home` into `EnvironmentManagerArgs`.
- Change `EnvironmentManager::new(...)` to load the provider from
`CODEX_HOME`.
- Preserve legacy behavior by falling back to
`DefaultEnvironmentProvider::from_env()` when `environments.toml` is
absent.
- Make `environments.toml`-backed managers start new threads with all
configured environments, default first, while keeping the legacy env-var
path single-default.
- Update the app-server, TUI, exec, MCP server, connector, prompt-debug,
and thread-manager-sample callsites to pass `codex_home` and handle
provider-loading errors.
## Self-Review Notes
- The multi-environment startup path is intentionally tied to the
`environments.toml` provider. Using `>1` configured environment as the
only signal would also expand the legacy `CODEX_EXEC_SERVER_URL`
provider because it keeps `local` addressable alongside `remote`.
- The startup environment list is still derived inside
`EnvironmentManager`; the provider only says whether its snapshot should
start new threads with all configured environments.
- The thread-manager sample was updated to pass the current
`ThreadManager::new(...)` installation id argument so the stack compiles
under Bazel.
## Stack
- 1. https://github.com/openai/codex/pull/20663 - Add stdio exec-server
listener
- 2. https://github.com/openai/codex/pull/20664 - Add stdio exec-server
client transport
- 3. https://github.com/openai/codex/pull/20665 - Make environment
providers own default selection
- 4. https://github.com/openai/codex/pull/20666 - Add CODEX_HOME
environments TOML provider
- **5. This PR:** https://github.com/openai/codex/pull/20667 - Load
configured environments from CODEX_HOME
Split from original draft: https://github.com/openai/codex/pull/20508
## Validation
- `just fmt`
- `git diff --check`
- `bazel build --config=remote --strategy=remote
--remote_download_toplevel
//codex-rs/thread-manager-sample:codex-thread-manager-sample`
- `bazel test --config=remote --strategy=remote
--remote_download_toplevel
//codex-rs/exec-server:exec-server-unit-tests`
- `bazel test --config=remote --strategy=remote
--remote_download_toplevel --test_sharding_strategy=disabled
--test_arg=default_thread_environment_selections_use_manager_default_id
//codex-rs/core:core-unit-tests`
- `bazel test --config=remote --strategy=remote
--remote_download_toplevel --test_sharding_strategy=disabled
--test_arg=start_thread_uses_all_default_environments_from_codex_home
//codex-rs/core:core-unit-tests`
## Documentation
This activates `CODEX_HOME/environments.toml`; user-facing documentation
should be added before this stack is treated as a documented public
workflow.
---------
Co-authored-by: Codex <noreply@openai.com>
* Pass installation ID for storage on enrollments server for
deduping/grouping multiple appservers per installation
* Pass installation ID in remoteControl/status/changed events
This does two things:
- We use `persist-credentials: false` everywhere now. This is
unfortunately not the default in GitHub Actions, but it prevents
`actions/checkout` from dropping `secrets.GITHUB_TOKEN` onto disk.
- We interpose (some) template expansions through environment variables.
I've limited this to contexts that have non-fixed values; contexts that
are fixed (like `*.result`) are not dangerous to expand directly inline
(but maybe we should clean those up in the future for consistency
anyways).
This is a medium-risk change in terms of CI breakage: I did a scan for
usage of `git push` and other commands that implicitly use the persisted
credential, but couldn't find any. Even still, some implicit usages of
the persisted credentials may be lurking. Please ping ww@ if any issues
arise.
## Summary
Codex keeps trying to add documentation to the `docs/` directory. With
the exception of app server API documentation, the docs for Codex should
not live in this repo. We don't want the local `docs/` folder to become
a stale shadow of the official docs.
This PR updates `AGENTS.md` to make that boundary explicit and scopes
the existing API documentation guidance to app-server docs/examples. It
also removes the extra `docs/config.md` sections that were recently
added.
## Why
`/fast` was wired as a one-off slash command even though model metadata
now exposes service tiers as catalog data. That meant adding another
tier, such as a slower/cheaper tier, would require more hardcoded TUI
plumbing instead of letting the model catalog drive the available
commands.
This change makes service-tier commands data-driven: each advertised
`service_tiers` entry becomes a `/name` command using the catalog
description, while the request path sends the tier `id` only when the
selected model supports it.
## What Changed
- Removed the hardcoded `/fast` slash-command variant and introduced
dynamic service-tier command items in the composer and command popup.
- Added toggle behavior for service-tier commands: invoking `/name`
selects that tier, and invoking it again clears the selection.
- Preserved the existing Fast-mode keybinding/status affordances by
resolving the current model tier whose name is `fast`, while still
sending the tier request value such as `priority`.
- Persisted service-tier selections as raw request strings so non-fast
tiers can round-trip through config.
- Updated the Bedrock catalog entry to advertise fast support through
`service_tiers` with `id: "priority"` and `name: "fast"`.
- Added defensive filtering in core so unsupported selected service
tiers are omitted from `/responses` requests.
## Validation
- Added/updated coverage for dynamic service-tier slash command lookup,
popup descriptions, composer dispatch, TUI fast toggling, and
unsupported-tier omission in core request construction.
- Local tests were not run per request.
---------
Co-authored-by: Codex <noreply@openai.com>
Cargo uses libgit2 by default. In uv, we gave up this entirely and
always call out to the git CLI because it is much more reliable. This is
a part of my attempt to reduce flakes in `rust-ci-full`.
Since https://github.com/openai/codex/pull/21255, `rust-ci-full` has
been failing due to a missing `bwrap`.
```
thread 'main' panicked at linux-sandbox/src/launcher.rs:43:13:
bubblewrap is unavailable: no system bwrap was found on PATH and no bundled codex-resources/bwrap binary was found next to the Codex executable
```
Since the happy path is now to use the system binary, let's ensure
that's installed.
8d51826631
was necessary for the `bwrap` executable to be discoverable when the
working directory is `/`.
I ran `rust-ci-full` at
https://github.com/openai/codex/actions/runs/25528074506
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
1. Removes the broad `DARWIN_USER_CACHE_DIR` write rule from the macOS
Seatbelt network policy.
2. Removes the now unused policy parameter plumbing for that cache path.
3. Adds sandboxing coverage that keeps `com.apple.trustd.agent` for TLS
while rejecting the cache write rule.
## Why
This closes the exact cache poisoning boundary. The earlier `gh` TLS
issue is now covered by trustd access, so the cache write is no longer
needed.
## Validation
1. Rust formatting passed.
2. The sandboxing crate tests passed.
3. Local macOS Seatbelt repro with patched policy passed. `gh api`
returned `21442` without the cache write rule.
Provider initialization installs process-global OTEL state, so invalid
trace metadata needs to fail before setup begins.
Use the same span attribute validator as config loading when traces are
exported so provider startup enforces the config contract without
duplicating validation logic.
## Why
Some consumers expect conventional hyphenated HTTP headers. Codex
already sends the session and thread IDs on outbound Responses requests,
but it only uses the underscore spellings today, which makes those IDs
harder to consume in systems that normalize or reject underscore header
names.
Full context here:
https://openai.slack.com/archives/C08KCGLSPSQ/p1778248578422369
## What changed
- `build_session_headers` now emits both `session_id` and `session-id`
when a session ID is present.
- It does the same for `thread_id` and `thread-id`.
- Added regression coverage in `codex-api/tests/clients.rs` and
`core/tests/suite/client.rs` so both the lower-level client tests and
the end-to-end request tests assert the two header spellings are
present.
## Test plan
- Added header assertions in `codex-api/tests/clients.rs`.
- Added request-header assertions in `core/tests/suite/client.rs` for
both the `/v1/responses` and `/api/codex/responses` request paths.
Fixes#21665.
## Why
The TUI status line is the right place for compact, glanceable session
state. The original request was motivated by the need to see the active
permission posture without opening `/permissions` or `/status`,
especially when switching between safer and more permissive modes during
a session.
This PR intentionally separates `permissions` from `approval-mode`
instead of combining them into one status-line item. They answer related
but different questions: `permissions` describes the active
sandbox/profile shape, while `approval-mode` describes how command
approvals are handled. Keeping them separate makes each item
independently configurable and avoids long combined labels in an already
space-constrained status line.
The tradeoff is that users who want the full permission posture in the
status line need to opt into both items. In exchange, users can show
only the sandbox/profile label, only the approval behavior, or both, and
named user-defined profiles remain concise. Non-standard permission
shapes are rendered as `Custom permissions` rather than trying to
squeeze detailed profile contents into the status line; `/status`
remains the fuller explanatory surface.
## What changed
- Added a configurable `permissions` status-line item.
- Added a separate `approval-mode` status-line item, with `approval` as
an alias.
- Render standard permission states compactly as `Read Only`,
`Workspace`, or `Full Access`.
- Preserve user-defined permission profile names directly in the status
line.
- Render unnamed non-standard permission shapes as `Custom permissions`.
- Refresh status surfaces when `/permissions` updates the permission
profile, approval policy, or approval reviewer.
- Updated status-line preview snapshot coverage for the new items.
## Verification
- `cargo test -p codex-tui
status_permissions_non_default_workspace_write_uses_workspace_label`
- `cargo test -p codex-tui
permissions_selection_emits_history_cell_when_selection_changes`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`
## Why
The configurable `/statusline` and terminal title can display session
token usage. That display was using the raw total token count, which
includes cached input tokens, so it significantly overstated the token
usage compared with the blended token count shown elsewhere (in
`/status` and tracked in goals). This inconsistency resulted in user
confusion. We don't want to report cached tokens because we don't charge
for them and they are somewhat of an implementation detail that users
shouldn't care about.
## What changed
- Use `TokenUsage::blended_total()` for the `used-tokens` status surface
item so cached input is excluded.
- Add a brief comment to `tokens_in_context_window()` clarifying that it
returns raw `total_tokens`, whose meaning depends on whether the caller
has last-turn or accumulated usage.
## Summary
- enable `apply_patch_freeform` by default in the feature registry
## Why
- make the freeform `apply_patch` tool available by default when model
metadata does not explicitly opt into another mode
## Validation
- `just fmt`
- did not run tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`service_tier` in `config.toml` and profile config was still modeled as
an enum, which blocked newer or experimental service tier IDs even
though the runtime paths already carry string IDs.
This change makes the TOML-facing config accept string service tier IDs
directly while keeping the legacy `fast` alias behavior by normalizing
it to the request value `priority`.
## What Changed
- change the TOML-facing `service_tier` fields in global and profile
config to `Option<String>`
- keep config-load normalization so legacy `fast` still resolves to
`priority`
- persist resolved service tier strings directly in config locks so
arbitrary IDs round-trip cleanly
- regenerate the config schema and add config coverage for arbitrary
string IDs plus legacy `fast` normalization
## Verification
- added config tests for arbitrary string service tiers and legacy
`fast` normalization
- ran `just write-config-schema`
- CI
---------
Co-authored-by: Codex <noreply@openai.com>
## What changed
- rewrote `shutdown_flushes_pending_metadata_irrelevant_updated_at` to
seed an existing pending `updated_at` touch directly in
`RolloutWriterState`
- kept the shutdown test focused on draining a pending touch, leaving
the separate coalescing test to cover timing-based deferral
## Why
The old test had to complete several async operations inside the 50 ms
test-only coalescing window. When that sequence took longer, the second
flush updated `threads.updated_at` immediately and the pre-shutdown
equality assertion failed, even though shutdown behavior was correct.
## Validation
- `cargo test -p codex-rollout
shutdown_flushes_pending_metadata_irrelevant_updated_at`
- `cargo test -p codex-rollout`
Co-authored-by: Codex <noreply@openai.com>
## Summary
API-key-auth remote compaction requests should not inherit
`service_tier` from normal `/responses` turns. This path needs to match
API auth expectations, while ChatGPT-auth remote compaction should keep
reusing the shared request fields that still apply there.
This change keeps the decision inline in
`codex-rs/core/src/compact_remote.rs` only. Under API key auth, the
classic remote `/responses/compact` path now omits `service_tier`; under
ChatGPT auth, it keeps reusing the configured tier.
`codex-rs/core/src/compact_remote_v2.rs` is unchanged. The remote
compaction parity coverage and snapshots were updated to assert the
API-key omission and preserve the ChatGPT-auth behavior.
## Testing
- Updated remote compaction parity coverage in
`codex-rs/core/tests/suite/compact_remote.rs` and the corresponding
snapshots.
## Why
`codex exec` still included the stale `research preview` label in its
human-readable startup banner, which makes the CLI look older and less
current than it is.
Fixes#21444.
## What Changed
Removed the hard-coded ` (research preview)` suffix from the `OpenAI
Codex v<version>` startup banner in
`codex-rs/exec/src/event_processor_with_human_output.rs`.
## Validation
Local validation was not required for this one-line startup banner text
cleanup.
Fixes#20870.
## Summary
The feature request template currently links users to the README
`#contributing` anchor, but that anchor does not exist. This can confuse
users who are trying to understand contribution expectations before
filing a request.
This updates `.github/ISSUE_TEMPLATE/5-feature-request.yml` to point
`Contributing` at `docs/contributing.md`, matching the repository's
existing contribution guidance.
Issue forms should only reference labels that exist in the repository so
new reports receive the intended automatic labels.
This updates the CLI issue form to stop applying the missing `needs
triage` label, and changes the documentation issue form from `docs` to
the existing `documentation` label.
Fixes#21158
Fixes#21270.
The CLI bug report template defined `description` twice for the terminal
emulator field. Because duplicate YAML keys are ambiguous and parsers
generally keep the later value, the form could drop the multiplexer
guidance.
This combines that guidance with the terminal examples under a single
block scalar in `.github/ISSUE_TEMPLATE/3-cli.yml`.
Requires discoverability on plugin/share/updateTargets so the server can
manage workspace link access consistently, including auto-adding the
workspace principal for UNLISTED.
Also rejects LISTED on share creation and blocks client-supplied
workspace principals while preserving response parsing for LISTED.
## Summary
Codex's Amazon Bedrock provider signs Mantle requests with SigV4 using
credentials resolved by the AWS SDK. That worked for standard AWS
profiles and environment credentials, but AWS CLI console-login profiles
created by `aws login` require the SDK's `credentials-login` feature to
resolve `login_session` credentials.
This change enables that credential provider so Bedrock can use AWS
console-login credentials through the existing provider-owned AWS auth
path.
While testing the console-login path, we also hit a Mantle-specific
SigV4 regression from the new split between `session_id` and
`thread_id`. Mantle does not preserve legacy OpenAI compatibility
headers that use `snake_case` before SigV4 verification, so signing
those headers can make the server reconstruct a different canonical
request. The Bedrock auth path now removes that header class before
signing, keeping preserved hyphenated Codex/AWS headers such as
`x-codex-turn-metadata` signed normally.
## Changes
- Enable `aws-config`'s `credentials-login` feature in
`codex-rs/aws-auth`.
- Add a compile-time regression test for
`aws_config::login::LoginCredentialsProvider`.
- Strip `snake_case` compatibility headers from Bedrock Mantle SigV4
requests before signing.
- Expand the Bedrock auth regression test to cover `session_id`,
`thread_id`, and future headers of the same shape.
- Refresh Cargo and Bazel lockfiles for the added `aws-sdk-signin`
dependency.
## Tests
- tested with `aws login` locally and verified that it works as
intended.
## Summary
- Remove `perCwdExtraUserRoots` / `SkillsListExtraRootsForCwd` from the
`skills/list` app-server API.
- Drop Rust app-server and `codex-core-skills` extra-root plumbing so
skill scans are keyed by the normal cwd/user/plugin roots only.
- Regenerate app-server schemas and update docs/tests that only existed
for the removed extra-roots behavior.
## Validation
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core-skills`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core-skills`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
## Notes
- `cargo test -p codex-app-server --test all skills_list` ran the edited
skills-list cases, but the full filtered run ended on existing
`skills_changed_notification_is_emitted_after_skill_change` timeout
after a websocket `401`.
- `cargo test -p codex-tui --lib` compiled the changed TUI callers, then
failed two unrelated status permission tests because local
`/etc/codex/requirements.toml` forbids `DangerFullAccess`.
- Source-truth check found the OpenAI monorepo still has
generated/app-server-kit mirror references to the removed field; those
should be cleaned up when generated app-server types are synced or in a
companion OpenAI cleanup.
## Why
We want terminal tool review analytics, but the reducer should not stamp
review timing from its own wall clock.
This PR plumbs review timing through the real protocol and app-server
seams so downstream analytics can consume the emitter's timestamps
directly. Guardian reviews keep their enriched `started_at` /
`completed_at` analytics fields by deriving those legacy second-based
values from the same protocol-native millisecond lifecycle timestamps,
rather than sampling a separate analytics clock.
## What changed
- add `started_at_ms` to user approval request payloads
- add `started_at_ms` / `completed_at_ms` to guardian review
notifications
- preserve Guardian review `started_at` / `completed_at` enrichment from
the protocol-native timing source
- stamp typed `ServerResponse` analytics facts with app-server-observed
`completed_at_ms`
- thread the new timing fields through core, protocol, app-server, TUI,
and analytics fixtures
## Verification
- `cargo test -p codex-app-server outgoing_message --manifest-path
codex-rs/Cargo.toml`
- `cargo test -p codex-app-server-protocol guardian --manifest-path
codex-rs/Cargo.toml`
- `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml`
- `cargo test -p codex-analytics analytics_client_tests --manifest-path
codex-rs/Cargo.toml`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434).
* #18748
* __->__ #21434
* #18747
* #17090
* #17089
* #20514
## Why
Remote compaction v2 consumes a normal Responses stream, but that
compaction-specific stream consumer dropped the `response.completed` id.
As a result, the `responses_websocket_response_processed` lifecycle
notification was emitted for normal turn sampling but not after a v2
remote compaction response was fully processed.
## What changed
- Return the completed response id alongside the v2 `context_compaction`
output item.
- After v2 compacted history is installed, send `response.processed`
through the same websocket session when the feature is enabled.
- Add websocket regression coverage for a remote compaction v2 request
followed by `response.processed`.
## Verification
- `cargo test -p codex-core --test all
responses_websocket_sends_response_processed_after_remote_compaction_v2
-- --nocapture`
- `cargo test -p codex-core
collect_context_compaction_output_accepts_additional_output_items --
--nocapture`
## Why
After stdio transports and provider-owned defaults exist, Codex needs a
config-backed provider that can describe more than the single legacy
`CODEX_EXEC_SERVER_URL` remote. This PR adds that provider without
activating it in product entrypoints yet, keeping parser/validation
review separate from runtime wiring.
**Stack position:** this is PR 4 of 5. It builds on PR 3's
provider/default model and adds the `environments.toml` provider used by
PR 5.
## What Changed
- Add `environment_toml.rs` as the TOML-specific home for parsing,
validation, and provider construction.
- Keep the TOML schema/provider structs private; the public constructor
added here is `EnvironmentManager::from_codex_home(...)`.
- Add `TomlEnvironmentProvider`, including validation for:
- reserved ids such as `local` and `none`
- duplicate ids
- unknown explicit defaults
- empty programs or URLs
- exactly one of `url` or `program` per configured environment
- Support websocket environments with `url = "ws://..."` / `wss://...`.
- Support stdio-command environments with `program = "..."`.
- Add helpers to load `environments.toml` from `CODEX_HOME`, but do not
wire entrypoints to call them yet.
- Add the `toml` dependency for parsing.
## Stack
- 1. https://github.com/openai/codex/pull/20663 - Add stdio exec-server
listener
- 2. https://github.com/openai/codex/pull/20664 - Add stdio exec-server
client transport
- 3. https://github.com/openai/codex/pull/20665 - Make environment
providers own default selection
- **4. This PR:** https://github.com/openai/codex/pull/20666 - Add
CODEX_HOME environments TOML provider
- 5. https://github.com/openai/codex/pull/20667 - Load configured
environments from CODEX_HOME
Split from original draft: https://github.com/openai/codex/pull/20508
## Validation
Not run locally; this was split out of the original draft stack.
## Documentation
This introduces the config shape for `environments.toml`; user-facing
documentation should be added before this stack is treated as a
documented public workflow.
---------
Co-authored-by: Codex <noreply@openai.com>
Route view_image through selected environments so image reads use the selected turn environment and cwd, with schema exposure limited to multi-environment toolsets.\n\nCo-authored-by: Codex <noreply@openai.com>
## Why
The next PR in this stack introduces configured environments, where the
provider knows both which environments exist and which one should be
selected by default. The existing manager derived the default internally
by checking for the legacy `remote` and `local` ids, and it treated
"remote" as equivalent to "has a websocket URL." That does not work
cleanly for stdio-command remotes because they are remote environments
without an `exec_server_url`.
**Stack position:** this is PR 3 of 5. It is the environment-model
bridge between PR 2's transport enum and PR 4's TOML provider.
## What Changed
- Add `DefaultEnvironmentSelection` to the `EnvironmentProvider`
contract:
- `Derived` preserves the old `remote`-then-`local` fallback behavior.
- `Environment(id)` lets a provider explicitly select a configured
default.
- `Disabled` lets a provider intentionally expose no default
environment.
- Move the legacy `CODEX_EXEC_SERVER_URL=none` default-disabling
behavior into `DefaultEnvironmentProvider`.
- Make `EnvironmentManager` validate explicit provider defaults and
return an error if the selected id is missing.
- Track `remote_transport` separately from `exec_server_url` so
stdio-command environments are still recognized as remote.
- Add `Environment::remote_stdio_shell_command(...)` for the TOML
provider added in the next PR.
## Stack
- 1. https://github.com/openai/codex/pull/20663 - Add stdio exec-server
listener
- 2. https://github.com/openai/codex/pull/20664 - Add stdio exec-server
client transport
- **3. This PR:** https://github.com/openai/codex/pull/20665 - Make
environment providers own default selection
- 4. https://github.com/openai/codex/pull/20666 - Add CODEX_HOME
environments TOML provider
- 5. https://github.com/openai/codex/pull/20667 - Load configured
environments from CODEX_HOME
Split from original draft: https://github.com/openai/codex/pull/20508
## Validation
Not run locally; this was split out of the original draft stack.
---------
Co-authored-by: Codex <noreply@openai.com>
Remove the remote thread-store backend and checked-in protobuf
artifacts. We've moved these into another crate that link against this
one.
Also remove the config settings for thread store backend selection,
since we'll instead pass an instantiated thread store into the core-api
crate's main entrypoint.
2026-05-08 00:02:46 +00:00
1152 changed files with 87751 additions and 33414 deletions
@@ -53,7 +53,7 @@ Use `--window "past week"` or `--window-hours 168` when the user asks for a non-
## Summary
No major issues reported by users.
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Source: collector v5, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
@@ -65,7 +65,7 @@ Two issues are being surfaced by users:
🔥🔥 Terminal launch hangs on startup [1](https://github.com/openai/codex/issues/123)
🔥 Resume switches model providers unexpectedly [2](https://github.com/openai/codex/issues/456)
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Source: collector v5, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
5. In `## Details`, when details are requested, include a compact table only when useful:
@@ -76,7 +76,7 @@ Want details? I can expand this into the issue table.
- A clear quiet/no-concern sentence when there is no meaningful signal.
6. Use the JSON `attention_marker` exactly. It is empty for normal rows, `🔥` for elevated rows, and `🔥🔥` for very high-attention rows. The actual cutoffs are in `attention_thresholds`.
7. Use inline numbered references where a row or bullet points to issues, for example `Compaction bugs [1](https://github.com/openai/codex/issues/123), [2](https://github.com/openai/codex/issues/456)`. Do not add a separate footnotes section.
8. Label `interactions` as `Interactions`; it counts posts/comments/reactions during the requested window, not unique people.
8. Label `interactions` as `Interactions`; it counts unique human GitHub users who created a new issue, added a new comment, or reacted during the requested window. Multiple posts/reactions from the same user on the same issue count once.
9. Mention the collector `script_version`, repo checkout `git_head`, and time window in one compact source line. In default mode, put this before the details prompt so the final line still asks whether the user wants details. In details-upfront mode, it can be the footer.
## Reaction Handling
@@ -89,7 +89,7 @@ GitHub issue search is still seeded by issue `updated_at`, so a purely reaction-
## Attention Markers
The collector scales attention markers by the requested time window. The baseline is 5 human user interactions for `🔥` and 10 for `🔥🔥` over 24 hours; longer or shorter windows scale those cutoffs linearly and round up. For example, a one-week report uses 35 and 70 interactions. Human user interactions are human-authored new issue posts, human-authored new comments, and human reactions created during the window, including upvotes. Bot posts and bot reactions are excluded. In prose, explain this as high user interaction rather than naming the emoji.
The collector scales attention markers by the requested time window. The baseline is 5 unique human users for `🔥` and 10 unique human users for `🔥🔥` over 24 hours; longer or shorter windows scale those cutoffs linearly and round up. For example, a one-week report uses 35 and 70 interactions. Unique human users are users who authored a new issue, authored a new comment, or reacted during the window, including upvotes. Multiple actions from the same user on the same issue count once. Bot posts and bot reactions are excluded. In prose, explain this as high user interaction rather than naming the emoji.
"New issue comments are filtered by comment creation time within the window from the fetched comment set.",
"Reaction events are counted by GitHub reaction created_at timestamps for hydrated issues and fetched comments.",
"Current reaction totals are standing engagement signals; new_reactions and new_upvotes are windowed activity.",
"user_interactions counts unique human users per issue across new issues, new comments, and new reactions; repeated actions by the same user count once.",
"The collector does not assign semantic clusters; use summary_inputs as model-ready evidence for report-time clustering.",
"Pure reaction-only issues may be missed if GitHub issue search does not surface them via updated_at.",
"Issues updated during the window without a new issue body or new comment are retained because label/status edits can still be useful owner signals.",
Make sure you are running the [latest](https://npmjs.com/package/@openai/codex) version of Codex CLI. The bug you are experiencing may already have been fixed.
If your version supports it, please run `codex doctor --json` and paste the output in the "Codex doctor report" field below. This helps us diagnose install, config, auth, terminal, MCP, network, and local state issues.
- type:input
id:version
attributes:
@@ -41,9 +42,19 @@ body:
id:terminal
attributes:
label:What terminal emulator and version are you using (if applicable)?
description:Also note any multiplexer in use (screen / tmux / zellij)
description:|
E.g, VSCode, Terminal.app, iTerm2, Ghostty, Windows Terminal (WSL / PowerShell)
Also note any multiplexer in use (screen / tmux / zellij).
E.g., VS Code, Terminal.app, iTerm2, Ghostty, Windows Terminal (WSL / PowerShell)
- type:textarea
id:doctor
attributes:
label:Codex doctor report
description:|
If available, run `codex doctor --json` and paste the full output here.
The report is designed to redact secrets, but please review it before submitting.
If your Codex version does not support `doctor`, write `not available`.
1. Search existing issues for similar features. If you find one, 👍 it rather than opening a new one.
2. The Codex team will try to balance the varying needs of the community when prioritizing or rejecting new features. Not all features will be accepted. See [Contributing](https://github.com/openai/codex#contributing) for more details.
2. The Codex team will try to balance the varying needs of the community when prioritizing or rejecting new features. Not all features will be accepted. See [Contributing](https://github.com/openai/codex/blob/main/docs/contributing.md) for more details.
You are an assistant that reviews GitHub issues for the repository.
Your job is to choose the most appropriate labels for the issue described later in this prompt.
Follow these rules:
- Add one (and only one) of the following three labels to distinguish the type of issue. Default to "bug" if unsure.
1. bug — Reproducible defects in Codex products (CLI, VS Code extension, web, auth).
2. enhancement — Feature requests or usability improvements that ask for new capabilities, better ergonomics, or quality-of-life tweaks.
3. documentation — Updates or corrections needed in docs/README/config references (broken links, missing examples, outdated keys, clarification requests).
- If applicable, add one of the following labels to specify which sub-product or product surface the issue relates to.
1. CLI — the Codex command line interface.
2. extension — VS Code (or other IDE) extension-specific issues.
3. app - Issues related to the Codex desktop application.
4. codex-web — Issues targeting the Codex web UI/Cloud experience.
5. github-action — Issues with the Codex GitHub action.
6. iOS — Issues with the Codex iOS app.
- Additionally add zero or more of the following labels that are relevant to the issue content. Prefer a small set of precise labels over many broad ones.
- For agent-area issues, prefer the most specific applicable label. Use "agent" only as a fallback for agent-related issues that do not fit a more specific agent-area label. Prefer "app-server" over "session" or "config" when the issue is about app-server protocol, API, RPC, schema, launch, or bridge behavior. Use "memory" for agentic memory storage/retrieval and "performance" for high process memory utilization or memory leaks.
1. windows-os — Bugs or friction specific to Windows environments (always when PowerShell is mentioned, path handling, copy/paste, OS-specific auth or tooling failures).
2. mcp — Topics involving Model Context Protocol servers/clients.
3. mcp-server — Problems related to the codex mcp-server command, where codex runs as an MCP server.
4. azure — Problems or requests tied to Azure OpenAI deployments.
5. model-behavior — Undesirable LLM behavior: forgetting goals, refusing work, hallucinating environment details, quota misreports, or other reasoning/performance anomalies.
6. code-review — Issues related to the code review feature or functionality.
7. safety-check - Issues related to cyber risk detection or trusted access verification.
8. auth - Problems related to authentication, login, or access tokens.
9. exec - Problems related to the "codex exec" command or functionality.
10. hooks - Problems related to event hooks
11. context - Problems related to compaction, context windows, or available context reporting.
12. skills - Problems related to skills or plugins
13. custom-model - Problems that involve using custom model providers, local models, or OSS models.
14. rate-limits - Problems related to token limits, rate limits, or token usage reporting.
15. sandbox - Issues related to local sandbox environments or tool call approvals to override sandbox restrictions.
16. tool-calls - Problems related to specific tool call invocations including unexpected errors, failures, or hangs.
17. TUI - Problems with the terminal user interface (TUI) including keyboard shortcuts, copy & pasting, menus, or screen update issues.
18. app-server - Issues involving the app-server protocol or interfaces, including SDK/API payloads, thread/* and turn/* RPCs, app-server launch behavior, external app/controller bridges, and app-server protocol/schema behavior.
19. connectivity - Network connectivity or endpoint issues, including reconnecting messages, stream dropped/disconnected errors, websocket/SSE/transport failures, timeout/network/VPN/proxy/API endpoint failures, and related retry behavior.
20. subagent - Issues involving subagents, sub-agents, or multi-agent behavior, including spawn_agent, wait_agent, close_agent, worker/explorer roles, delegation, agent teams, lifecycle, model/config inheritance, quotas, and orchestration.
21. session - Issues involving session or thread management, including resume, fork, archive, rename/title, thread history, rollout persistence, compaction, checkpoints, retention, and cross-session state.
29. performance - Issues involving slow, laggy performance, high memory utilization, or memory leaks.
30. automations - Issues involving scheduled automation tasks or heartbeats.
31. pets - Issues involving pets avatars and animations.
32. agent - Fallback only for core agent loop or agent-related issues that do not fit app-server, connectivity, subagent, session, config, plan, computer-use, browser, memory, imagen, remote, performance, automations, or pets.
@@ -26,7 +26,7 @@ In the codex-rs folder where the rust code lives:
- Implementations may still use `async fn foo(&self, ...) -> T` when they satisfy that contract.
- Do not use `#[allow(async_fn_in_trait)]` as a shortcut around spelling the future contract explicitly.
- When writing tests, prefer comparing the equality of entire objects over fields one by one.
-When making a change that adds or changes an API, ensure that the documentation in the `docs/` folder is up to date if applicable.
-Do not add general product or user-facing documentation to the `docs/` folder. The official Codex documentation lives elsewhere. The exception is app-server API documentation, which is covered by the app-server guidance below.
- Prefer private modules and explicitly exported public crate API.
- If you change `ConfigToml` or nested config types, run `just write-config-schema` to update `codex-rs/core/config.schema.json`.
- When working with MCP tool calls, prefer using `codex-rs/codex-mcp/src/mcp_connection_manager.rs` to handle mutation of tools and tool calls. Aim to minimize the footprint of changes and leverage existing abstractions rather than plumbing code through multiple levels of function calls.
@@ -210,7 +210,7 @@ These guidelines apply to app-server protocol work in `codex-rs`, especially:
### Development Workflow
- Update docs/examples when API behavior changes (at minimum `app-server/README.md`).
- Update app-server docs/examples when API behavior changes (at minimum `app-server/README.md`).
- Regenerate schema fixtures when API shapes change:
`just write-app-server-schema`
(and `just write-app-server-schema --experimental` when experimental API fixtures are affected).
`bootstrap` requires the standalone managed install. It records the daemon
settings under `CODEX_HOME/app-server-daemon/`, starts app-server as a
pidfile-backed detached process, and launches a detached updater loop.
## Installation and update cases
The daemon assumes Codex is installed through `install.sh` and always launches
the standalone managed binary under `CODEX_HOME`.
| Situation | What starts | Does this daemon fetch new binaries? | Does a running app-server eventually move to a newer binary on its own? |
| --- | --- | --- | --- |
| `install.sh` has run, but only `start` is used | `start` uses `CODEX_HOME/packages/standalone/current/codex` | No | No. The managed path is used when starting or restarting, but no updater is installed. |
| `install.sh` has run, then `bootstrap` is used | The pidfile backend uses `CODEX_HOME/packages/standalone/current/codex` | Yes. Bootstrap launches a detached updater loop that runs `install.sh` hourly. | Yes, while that updater process is alive and app-server is already running. After a successful fetch, the updater restarts app-server with the refreshed binary and only then replaces its own process image. |
| Some other tool updates the managed binary path | The next fresh start or restart uses the updated file at that path | Only if `bootstrap` is active, because the updater still runs `install.sh` on its normal cadence. | Without `bootstrap`, no. With `bootstrap`, the next successful updater pass compares the managed binary contents after `install.sh` runs; if app-server is running and they differ from the updater's current image, it refreshes app-server first and then itself. |
### Standalone installs
For installs created by `install.sh`:
- lifecycle commands always use the standalone managed binary path
-`bootstrap` is supported
-`bootstrap` starts a detached pid-backed updater loop that fetches via
`install.sh`
- after a successful refresh, if app-server is running and the managed binary
contents changed, the updater restarts app-server with that binary first and
only then replaces its own process image
- the updater loop is not reboot-persistent; it must be started again by
rerunning `bootstrap` after a reboot
### Out-of-band updates
This daemon does not watch arbitrary executable files for replacement. If some
other tool updates the managed binary path:
- without `bootstrap`, a currently running app-server remains on the old
executable image until an explicit `restart`
- with `bootstrap`, the detached updater loop notices the changed managed
binary on its next successful scheduled pass after running `install.sh`; if
app-server is running, it refreshes app-server first and then refreshes itself
once that replacement starts successfully
## Lifecycle semantics
`start` is idempotent and returns after app-server is ready to answer the normal
JSON-RPC initialize handshake on the Unix control socket.
`restart` stops any managed daemon and starts it again.
`enable-remote-control` and `disable-remote-control` persist the launch setting
for future starts. If a managed app-server is already running, they restart it
so the new setting takes effect immediately.
Top-level `codex remote-control` bootstraps with `--remote-control` when the
updater loop is not running. Otherwise it enables remote control and starts the
daemon normally.
`stop` sends a graceful termination request first, then sends a second
termination signal after the grace window if the process is still alive.
All mutating lifecycle commands are serialized per `CODEX_HOME`, so a concurrent
"description":"Unix timestamp (in milliseconds) when this review completed.",
"format":"int64",
"type":"integer"
},
"decisionSource":{
"$ref":"#/definitions/AutoReviewDecisionSource"
},
@@ -1973,6 +1978,11 @@
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"type":[
@@ -1989,9 +1999,11 @@
},
"required":[
"action",
"completedAtMs",
"decisionSource",
"review",
"reviewId",
"startedAtMs",
"threadId",
"turnId"
],
@@ -2010,6 +2022,11 @@
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"type":[
@@ -2028,6 +2045,7 @@
"action",
"review",
"reviewId",
"startedAtMs",
"threadId",
"turnId"
],
@@ -2719,7 +2737,7 @@
"type":"string"
},
"RemoteControlStatusChangedNotification":{
"description":"Current remote-control connection status and environment id exposed to clients.",
"description":"Current remote-control connection status and remote identity exposed to clients.",
"description":"Generate a fresh upstream attestation result on demand.",
"properties":{
"id":{
"$ref":"#/definitions/RequestId"
},
"method":{
"enum":[
"attestation/generate"
],
"title":"Attestation/generateRequestMethod",
"type":"string"
},
"params":{
"$ref":"#/definitions/AttestationGenerateParams"
}
},
"required":[
"id",
"method",
"params"
],
"title":"Attestation/generateRequest",
"type":"object"
},
{
"description":"DEPRECATED APIs below Request to approve a patch. This request is used for Turns started via the legacy APIs (i.e. SendUserTurn, SendUserMessage).",
"description":"Generate a fresh upstream attestation result on demand.",
"properties":{
"id":{
"$ref":"#/definitions/v2/RequestId"
},
"method":{
"enum":[
"attestation/generate"
],
"title":"Attestation/generateRequestMethod",
"type":"string"
},
"params":{
"$ref":"#/definitions/AttestationGenerateParams"
}
},
"required":[
"id",
"method",
"params"
],
"title":"Attestation/generateRequest",
"type":"object"
},
{
"description":"DEPRECATED APIs below Request to approve a patch. This request is used for Turns started via the legacy APIs (i.e. SendUserTurn, SendUserMessage).",
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"type":[
@@ -9904,9 +10042,11 @@
},
"required":[
"action",
"completedAtMs",
"decisionSource",
"review",
"reviewId",
"startedAtMs",
"threadId",
"turnId"
],
@@ -9927,6 +10067,11 @@
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"type":[
@@ -9945,6 +10090,7 @@
"action",
"review",
"reviewId",
"startedAtMs",
"threadId",
"turnId"
],
@@ -11943,6 +12089,13 @@
"null"
]
},
"defaultPromptAlias":{
"description":"Mention label used before starter prompts on plugin detail surfaces.",
"description":"Unix timestamp (in milliseconds) when this review completed.",
"format":"int64",
"type":"integer"
},
"decisionSource":{
"$ref":"#/definitions/AutoReviewDecisionSource"
},
@@ -6499,6 +6570,11 @@
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"type":[
@@ -6515,9 +6591,11 @@
},
"required":[
"action",
"completedAtMs",
"decisionSource",
"review",
"reviewId",
"startedAtMs",
"threadId",
"turnId"
],
@@ -6538,6 +6616,11 @@
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"type":[
@@ -6556,6 +6639,7 @@
"action",
"review",
"reviewId",
"startedAtMs",
"threadId",
"turnId"
],
@@ -8554,6 +8638,13 @@
"null"
]
},
"defaultPromptAlias":{
"description":"Mention label used before starter prompts on plugin detail surfaces.",
"description":"Unix timestamp (in milliseconds) when this review completed.",
"format":"int64",
"type":"integer"
},
"decisionSource":{
"$ref":"#/definitions/AutoReviewDecisionSource"
},
@@ -584,6 +589,11 @@
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"description":"Stable identifier for this review.",
"type":"string"
},
"startedAtMs":{
"description":"Unix timestamp (in milliseconds) when this review started.",
"format":"int64",
"type":"integer"
},
"targetItemId":{
"description":"Identifier for the reviewed item or tool call when one exists.\n\nIn most cases, one review maps to one target item. The exceptions are - execve reviews, where a single command may contain multiple execve calls to review (only possible when using the shell_zsh_fork feature) - network policy reviews, where there is no target item\n\nA network call is triggered by a CommandExecution item, so having a target_item_id set to the CommandExecution item would be misleading because the review is about the network call, not the command execution. Therefore, target_item_id is set to None for network policy reviews.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.