## Why
`codex-tools` already owns the shared tool input schema model and parser
from the first extraction step, but `core/src/tools/spec.rs` still owned
the MCP-specific adapter that normalizes `rmcp::model::Tool` schemas and
wraps `structuredContent` into the call result output schema.
Keeping that adapter in `codex-core` means the reusable MCP schema path
is still split across crates, and the unit tests for that logic stay
anchored in `codex-core` even though the runtime orchestration does not
need to move yet.
This change takes the next small step by moving the reusable MCP schema
adapter into `codex-tools` while leaving `ResponsesApiTool` assembly in
`codex-core`.
## What changed
- added `tools/src/mcp_tool.rs` and sibling
`tools/src/mcp_tool_tests.rs`
- introduced `ParsedMcpTool`, `parse_mcp_tool()`, and
`mcp_call_tool_result_output_schema()` in `codex-tools`
- updated `core/src/tools/spec.rs` to consume parsed MCP tool parts from
`codex-tools`
- removed the now-redundant MCP schema unit tests from
`core/src/tools/spec_tests.rs`
- expanded `codex-rs/tools/README.md` to describe this second migration
step
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
## Summary
This PR makes Windows sandbox proxying enforceable by routing proxy-only
runs through the existing `offline` sandbox user and reserving direct
network access for the existing `online` sandbox user.
In brief:
- if a Windows sandbox run should be proxy-enforced, we run it as the
`offline` user
- the `offline` user gets firewall rules that block direct outbound
traffic and only permit the configured localhost proxy path
- if a Windows sandbox run should have true direct network access, we
run it as the `online` user
- no new sandbox identity is introduced
This brings Windows in line with the intended model: proxy use is not
just env-based, it is backed by OS-level egress controls. Windows
already has two sandbox identities:
- `offline`: intended to have no direct network egress
- `online`: intended to have full network access
This PR makes proxy-enforced runs use that model directly.
### Proxy-enforced runs
When proxy enforcement is active:
- the run is assigned to the `offline` identity
- setup extracts the loopback proxy ports from the sandbox env
- Windows setup programs firewall rules for the `offline` user that:
- block all non-loopback outbound traffic
- block loopback UDP
- block loopback TCP except for the configured proxy ports
- optionally allow broader localhost access when `allow_local_binding=1`
So the sandboxed process can only talk to the local proxy. It cannot
open direct outbound sockets or do local UDP-based DNS on its own.The
proxy then performs the real outbound network access outside that
restricted sandbox identity.
### Direct-network runs
When proxy enforcement is not active and full network access is allowed:
- the run is assigned to the `online` identity
- no proxy-only firewall restrictions are applied
- the process gets normal direct network access
### Unelevated vs elevated
The restricted-token / unelevated path cannot enforce per-identity
firewall policy by itself.
So for Windows proxy-enforced runs, we transparently use the logon-user
sandbox path under the hood, even if the caller started from the
unelevated mode. That keeps enforcement real instead of best-effort.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` should only describe the per-command permissions we
still want to grant dynamically. Keeping
`MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
approval, protocol, schema, and TUI branches for a capability we no
longer want to expose.
## What changed
- Removed the macOS-specific permission-profile types from
`codex-protocol`, the app-server v2 API, and the generated
schema/TypeScript artifacts.
- Deleted the core and sandboxing plumbing that threaded
`MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
construction.
- Simplified macOS seatbelt generation so it always includes the fixed
read-only preferences allowlist instead of carrying a configurable
profile extension.
- Removed the macOS additional-permissions UI/docs/test coverage and
deleted the obsolete macOS permission modules.
- Tightened `request_permissions` intersection handling so explicitly
empty requested read lists are preserved only when that field was
actually granted, avoiding zero-grant responses being stored as active
permissions.
## Why
`parse_tool_input_schema` and the supporting `JsonSchema` model were
living in `core/src/tools/spec.rs`, but they already serve callers
outside `codex-core`.
Keeping that shared schema parsing logic inside `codex-core` makes the
crate boundary harder to reason about and works against the guidance in
`AGENTS.md` to avoid growing `codex-core` when reusable code can live
elsewhere.
This change takes the first extraction step by moving the schema parsing
primitive into its own crate while keeping the rest of the tool-spec
assembly in `codex-core`.
## What changed
- added a new `codex-tools` crate under `codex-rs/tools`
- moved the shared tool input schema model and sanitizer/parser into
`tools/src/json_schema.rs`
- kept `tools/src/lib.rs` exports-only, with the module-level unit tests
split into `json_schema_tests.rs`
- updated `codex-core` to use `codex-tools::JsonSchema` and re-export
`parse_tool_input_schema`
- updated `codex-app-server` dynamic tool validation to depend on
`codex-tools` directly instead of reaching through `codex-core`
- wired the new crate into the Cargo workspace and Bazel build graph
Replaced ci.bazelrc and v8-ci.bazelrc by custom configs inside the main
.bazelrc file. As a result, github workflows setup is simplified down to
a single '--config=<foo>' flag usage.
Moved the build metadata flags to config=ci.
Added custom tags metadata to help differentiate invocations based on
workflow (bazel vs v8) and os (linux/macos/windows).
Enabled users to override the default values in .bazelrc by using a
user.bazelrc file locally.
Added user.bazelrc to gitignore.
Highlights:
- Trimmed down to just the repository cache for faster upload / download
- Made the cache key only include files that affect external
dependencies (since that's what the repository cache caches) -
MODULE.bazel, codex-rs/Cargo.lock, codex-rs/Cargo.toml
- Split the caching action in to explicit restore / save steps (similar
to your rust CI) which allows us to skip uploads on cache hit, and not
fail the build if upload fails
This should get rid of 842 network fetches that are happening on every
Bazel CI run, while also reducing the Github flakiness @bolinfest
reported. Uploading should be faster (since we're not caching many small
files), and will only happen when MODULE.bazel or Cargo.lock /
Cargo.toml files change.
In my testing, it [took 3s to save the repository
cache](https://github.com/siggisim/codex/actions/runs/23014186143/job/66832859781).
## Summary
Fail closed when the network proxy's local/private IP pre-check hits a
DNS lookup error or timeout, instead of treating the hostname as public
and allowing the request.
## Root cause
`host_resolves_to_non_public_ip()` returned `false` on resolver failure,
which created a fail-open path in the `allow_local_binding = false`
boundary. The eventual connect path performs its own DNS resolution
later, so a transient pre-check failure is not evidence that the
destination is public.
## Changes
- Treat DNS lookup errors/timeouts as local/private for blocking
purposes
- Add a regression test for an allowlisted hostname that fails DNS
resolution
## Validation
- `cargo test -p codex-network-proxy`
- `cargo clippy -p codex-network-proxy --all-targets -- -D warnings`
- `just fmt`
- `just argument-comment-lint`
## Why
This is effectively a follow-up to
[#15812](https://github.com/openai/codex/pull/15812). That change
removed the special skill-script exec path, but `skill_metadata` was
still being threaded through command-approval payloads even though the
approval flow no longer uses it to render prompts or resolve decisions.
Keeping it around added extra protocol, schema, and client surface area
without changing behavior.
Removing it keeps the command-approval contract smaller and avoids
carrying a dead field through app-server, TUI, and MCP boundaries.
## What changed
- removed `ExecApprovalRequestSkillMetadata` and the corresponding
`skillMetadata` field from core approval events and the v2 app-server
protocol
- removed the generated JSON and TypeScript schema output for that field
- updated app-server, MCP server, TUI, and TUI app-server approval
plumbing to stop forwarding the field
- cleaned up tests that previously constructed or asserted
`skillMetadata`
## Testing
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-test-client`
- `cargo test -p codex-mcp-server`
- `just argument-comment-lint`
## Summary
- move the bwrap PATH lookup and warning helpers out of config/mod.rs
- move the related tests into a dedicated bwrap_tests.rs file
## Validation
- git diff --check
- skipped heavier local tests per request
Follow-up to #15791.
## Why
`codex-core` is already the largest crate in `codex-rs`, so defaulting
to it for new functionality makes it harder to keep the workspace
modular. The repo guidance should make it explicit that contributors are
expected to look for an existing non-`codex-core` crate, or introduce a
new crate, before growing `codex-core` further.
## What Changed
- Added a dedicated `The \`codex-core\` crate` section to `AGENTS.md`.
- Documented why `codex-core` should be treated as a last resort for new
functionality.
- Added concrete guidance for both implementation and review: prefer an
existing non-`codex-core` crate when possible, introduce a new workspace
crate when that is the cleaner boundary, and push back on PRs that grow
`codex-core` unnecessarily.
## Why
`SandboxCommand.program` represents an executable path, but keeping it
as `String` forced path-backed callers to run `to_string_lossy()` before
the sandbox layer ever touched the command. That loses fidelity earlier
than necessary and adds avoidable conversions in runtimes that already
have a `PathBuf`.
## What changed
- Changed `SandboxCommand.program` to `OsString`.
- Updated `SandboxManager::transform` to keep the program and argv in
`OsString` form until the `SandboxExecRequest` conversion boundary.
- Switched the path-backed `apply_patch` and `js_repl` runtimes to pass
`into_os_string()` instead of `to_string_lossy()`.
- Updated the remaining string-backed builders and tests to match the
new type while preserving the existing Linux helper `arg0` behavior.
## Verification
- `cargo test -p codex-sandboxing`
- `just argument-comment-lint -p codex-core -p codex-sandboxing`
- `cargo test -p codex-core` currently fails in unrelated existing
config tests: `config::tests::approvals_reviewer_*` and
`config::tests::smart_approvals_alias_*`
## Why
`token_data` is owned by `codex-login`, but `codex-core` was still
re-exporting it. That let callers pull auth token types through
`codex-core`, which keeps otherwise unrelated crates coupled to
`codex-core` and makes `codex-core` more of a build-graph bottleneck.
## What changed
- remove the `codex-core` re-export of `codex_login::token_data`
- update the remaining `codex-core` internals that used
`crate::token_data` to import `codex_login::token_data` directly
- update downstream callers in `codex-rs/chatgpt`,
`codex-rs/tui_app_server`, `codex-rs/app-server/tests/common`, and
`codex-rs/core/tests` to import `codex_login::token_data` directly
- add explicit `codex-login` workspace dependencies and refresh lock
metadata for crates that now depend on it directly
## Validation
- `cargo test -p codex-chatgpt --locked`
- `just argument-comment-lint`
- `just bazel-lock-update`
- `just bazel-lock-check`
## Notes
- attempted `cargo test -p codex-core --locked` and `cargo test -p
codex-core auth_refresh --locked`, but both ran out of disk while
linking `codex-core` test binaries in the local environment
## Problem
Codex already treated an existing top-level project `./.codex` directory
as protected, but there was a gap on first creation.
If `./.codex` did not exist yet, a turn could create files under it,
such as `./.codex/config.toml`, without going through the same approval
path as later modifications. That meant the initial write could bypass
the intended protection for project-local Codex state.
## What this changes
This PR closes that first-creation gap in the Unix enforcement layers:
- `codex-protocol`
- treat the top-level project `./.codex` path as a protected carveout
even when it does not exist yet
- avoid injecting the default carveout when the user already has an
explicit rule for that exact path
- macOS Seatbelt
- deny writes to both the exact protected path and anything beneath it,
so creating `./.codex` itself is blocked in addition to writes inside it
- Linux bubblewrap
- preserve the same protected-path behavior for first-time creation
under `./.codex`
- tests
- add protocol regressions for missing `./.codex` and explicit-rule
collisions
- add Unix sandbox coverage for blocking first-time `./.codex` creation
- tighten Seatbelt policy assertions around excluded subpaths
## Scope
This change is intentionally scoped to protecting the top-level project
`.codex` subtree from agent writes.
It does not make `.codex` unreadable, and it does not change the product
behavior around loading project skills from `.codex` when project config
is untrusted.
## Why this shape
The fix is pointed rather than broad:
- it preserves the current model of “project `.codex` is protected from
writes”
- it closes the security-relevant first-write hole
- it avoids folding a larger permissions-model redesign into this PR
## Validation
- `cargo test -p codex-protocol`
- `cargo test -p codex-sandboxing seatbelt`
- `cargo test -p codex-exec --test all
sandbox_blocks_first_time_dot_codex_creation -- --nocapture`
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
## Why
Skill metadata accepted a `permissions` block and stored the result on
`SkillMetadata`, but that data was never consumed by runtime behavior.
Leaving the dead parsing path in place makes it look like skills can
widen or otherwise influence execution permissions when, in practice,
declared skill permissions are ignored.
This change removes that misleading surface area so the skill metadata
model matches what the system actually uses.
## What changed
- removed `permission_profile` and `managed_network_override` from
`core-skills::SkillMetadata`
- stopped parsing `permissions` from skill metadata in
`core-skills/src/loader.rs`
- deleted the loader tests that only exercised the removed permissions
parsing path
- cleaned up dependent `SkillMetadata` constructors in tests and TUI
code that were only carrying `None` for those fields
## Testing
- `cargo test -p codex-core-skills`
- `cargo test -p codex-tui
submission_prefers_selected_duplicate_skill_path`
- `just argument-comment-lint`
## Summary
- resolve system bwrap from PATH instead of hardcoding /usr/bin/bwrap
- skip PATH entries that resolve inside the current workspace before
launching the sandbox helper
- keep the vendored bubblewrap fallback when no trusted system bwrap is
found
## Validation
- cargo test -p codex-core bwrap --lib
- cargo test -p codex-linux-sandbox
- just fix -p codex-core
- just fix -p codex-linux-sandbox
- just fmt
- just argument-comment-lint
- cargo clean
- [x] Polish tool suggest prompts to distinguish between missing
connectors and discoverable plugins, and be very precise about the
triggering conditions.
## TR;DR
Replicates the `/title` command from `tui` to `tui_app_server`.
## Problem
The classic `tui` crate supports customizing the terminal window/tab
title via `/title`, but the `tui_app_server` crate does not. Users on
the app-server path have no way to configure what their terminal title
shows (project name, status, spinner, thread, etc.), making it harder to
identify Codex sessions across tabs or windows.
## Mental model
The terminal title is a *status surface* -- conceptually parallel to the
footer status line. Both surfaces are configurable lists of items, both
share expensive inputs (git branch lookup, project root discovery), and
both must be refreshed at the same lifecycle points. This change ports
the classic `tui`'s design verbatim:
1. **`terminal_title.rs`** owns the low-level OSC write path and input
sanitization. It strips control characters and bidi/invisible codepoints
before placing untrusted text (model output, thread names, project
paths) inside an escape sequence.
2. **`title_setup.rs`** defines `TerminalTitleItem` (the 8 configurable
items) and `TerminalTitleSetupView` (the interactive picker that wraps
`MultiSelectPicker`).
3. **`status_surfaces.rs`** is the shared refresh pipeline. It parses
both surface configs once per refresh, warns about invalid items once
per session, synchronizes the git-branch cache, then renders each
surface from the same `StatusSurfaceSelections` snapshot.
4. **`chatwidget.rs`** sets `TerminalTitleStatusKind` at each state
transition (Working, Thinking, Undoing, WaitingForBackgroundTerminal)
and calls `refresh_terminal_title()` whenever relevant state changes.
5. **`app.rs`** handles the three setup events (confirm/preview/cancel),
persists config via `ConfigEditsBuilder`, and clears the managed title
on `Drop`.
## Non-goals
- **Restoring the previous terminal title on exit.** There is no
portable way to read the terminal's current title, so `Drop` clears the
managed title rather than restoring it.
- **Sharing code between `tui` and `tui_app_server`.** The
implementation is a parallel copy, matching the existing pattern for the
status-line feature. Extracting a shared crate is future work.
## Tradeoffs
- **Duplicate code across crates.** The three core files
(`terminal_title.rs`, `title_setup.rs`, `status_surfaces.rs`) are
byte-for-byte copies from the classic `tui`. This was chosen for
consistency with the existing status-line port and to avoid coupling the
two crates at the dependency level. Future changes must be applied in
both places.
- **`status_surfaces.rs` is large (~660 lines).** It absorbs logic that
previously lived inline in `chatwidget.rs` (status-line refresh, git
branch management, project root discovery) plus all new terminal-title
logic. This consolidation trades file size for a single place where both
surfaces are coordinated.
- **Spinner scheduling on every refresh.** The terminal title spinner
(when active) schedules a frame every 100ms. This is the same pattern
the status-indicator spinner already uses; the overhead is a timer
registration, not a redraw.
## Architecture
```
/title command
-> SlashCommand::Title
-> open_terminal_title_setup()
-> TerminalTitleSetupView (MultiSelectPicker)
-> on_change: AppEvent::TerminalTitleSetupPreview -> preview_terminal_title()
-> on_confirm: AppEvent::TerminalTitleSetup -> ConfigEditsBuilder + setup_terminal_title()
-> on_cancel: AppEvent::TerminalTitleSetupCancelled -> cancel_terminal_title_setup()
Runtime title refresh:
state change (turn start, reasoning, undo, plan update, thread rename, ...)
-> set terminal_title_status_kind
-> refresh_terminal_title()
-> status_surface_selections() (parse configs, collect invalids)
-> refresh_terminal_title_from_selections()
-> terminal_title_value_for_item() for each configured item
-> assemble title string with separators
-> skip if identical to last_terminal_title (dedup OSC writes)
-> set_terminal_title() (sanitize + OSC 0 write)
-> schedule spinner frame if animating
Widget replacement:
replace_chat_widget_with_app_server_thread()
-> transfer last_terminal_title from old widget to new
-> avoids redundant OSC clear+rewrite on session switch
```
## Observability
- Invalid terminal-title item IDs in config emit a one-per-session
warning via `on_warning()` (gated by
`terminal_title_invalid_items_warned` `AtomicBool`).
- OSC write failures are logged at `tracing::debug` level.
- Config persistence failures are logged at `tracing::error` and
surfaced to the user via `add_error_message()`.
## Tests
- `terminal_title.rs`: 4 unit tests covering sanitization (control
chars, bidi codepoints, truncation) and OSC output format.
- `title_setup.rs`: 3 tests covering setup view snapshot rendering,
parse order preservation, and invalid-ID rejection.
- `chatwidget/tests.rs`: Updated test helpers with new fields; existing
tests continue to pass.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## Summary
Add a focused codex network proxy unit test for the denylist pattern
with wildcard in the middle `region*.some.malicious.tunnel.com`. This
does not change how existing code works, just ensure that behavior stays
the same and we got CI guards to guard existin behavior.
## Why
The managed Codex denylist update relies on this mid label glob form,
and the existing tests only covered exact hosts, `*.` subdomains, and
`**.` apex plus subdomains.
## Validation
`cargo test -p codex-network-proxy
compile_globset_supports_mid_label_wildcards`
`cargo test -p codex-network-proxy`
`./tools/argument-comment-lint/run-prebuilt-linter.sh -p
codex-network-proxy`
## Summary
- block git global options that can redirect config, repository, or
helper lookup from being auto-approved as safe
- share the unsafe global-option predicate across the Unix and Windows
git safety checks
- add regression coverage for inline and split forms, including `bash
-lc` and PowerShell wrappers
## Root cause
The Unix safe-command gate only rejected `-c` and `--config-env`, even
though the shared git parser already knew how to skip additional
pre-subcommand globals such as `--git-dir`, `--work-tree`,
`--exec-path`, `--namespace`, and `--super-prefix`. That let those
arguments slip through safe-command classification on otherwise
read-only git invocations and bypass approval. The Windows-specific
safe-command path had the same trust-boundary gap for git global
options.
## Why
`#[large_stack_test]` made the `apply_patch_cli` tests pass by giving
them more stack, but it did not address why those tests needed the extra
stack in the first place.
The real problem is the async state built by the `apply_patch_cli`
harness path. Those tests await three helper boundaries directly:
harness construction, turn submission, and apply-patch output
collection. If those helpers inline their full child futures, the test
future grows to include the whole harness startup and request/response
path.
This change replaces the workaround from #12768 with the same basic
approach used in #13429, but keeps the fix narrower: only the helper
boundaries awaited directly by `apply_patch_cli` stay boxed.
## What Changed
- removed `#[large_stack_test]` from
`core/tests/suite/apply_patch_cli.rs`
- restored ordinary `#[tokio::test(flavor = "multi_thread",
worker_threads = 2)]` annotations in that suite
- deleted the now-unused `codex-test-macros` crate and removed its
workspace wiring
- boxed only the three helper boundaries that the suite awaits directly:
- `apply_patch_harness_with(...)`
- `TestCodexHarness::submit(...)`
- `TestCodexHarness::apply_patch_output(...)`
- added comments at those boxed boundaries explaining why they remain
boxed
## Testing
- `cargo test -p codex-core --test all suite::apply_patch_cli --
--nocapture`
## References
- #12768
- #13429
## Summary
- enrich `codex.mcp.call` with `tool`, `connector_id`, and sanitized
`connector_name` for actual MCP executions
- record `codex.mcp.call.duration_ms` for actual MCP executions so
connector-level latency is visible in metrics
- keep skipped, blocked, declined, and cancelled paths on the plain
status-only `codex.mcp.call` counter
## Included Changes
- `codex-rs/core/src/mcp_tool_call.rs`: add connector-sliced MCP count
and duration metrics only for executed tool calls, while leaving
non-executed outcomes as status-only counts
- `codex-rs/core/src/mcp_tool_call_tests.rs`: cover metric tag shaping,
connector-name sanitization, and the new duration metric tags
## Testing
- `cargo test -p codex-core`
- `just fix -p codex-core`
- `just fmt`
## Notes
- `cargo test -p codex-core` still hits existing unrelated failures in
approvals-reviewer config tests and the sandboxed JS REPL `mktemp` test
- full workspace `cargo test` was not run
---------
Co-authored-by: Codex <noreply@openai.com>
## Symptoms
When `/review` ran through `tui_app_server`, the TUI could show
duplicate review content:
- the `>> Code review started: ... <<` banner appeared twice
- the final review body could also appear twice
## Problem
`tui_app_server` was treating review lifecycle items as renderable
content on more than one delivery path.
Specifically:
- `EnteredReviewMode` was rendered both when the item started and again
when it completed
- `ExitedReviewMode` rendered the review text itself, even though the
same review text was also delivered later as the assistant message item
That meant the same logical review event was committed into history
multiple times.
## Solution
Make review lifecycle items control state transitions only once, and
keep the final review body sourced from the assistant message item:
- render the review-start banner from the live `ItemStarted` path, while
still allowing replay to restore it once
- treat `ExitedReviewMode` as a mode-exit/finish-banner event instead of
rendering the review body from it
- preserve the existing assistant-message rendering path as the single
source of final review text
This PR partially rebase `unified_exec` on the `exec-server` and adapt
the `exec-server` accordingly.
## What changed in `exec-server`
1. Replaced the old "broadcast-driven; process-global" event model with
process-scoped session events. The goal is to be able to have dedicated
handler for each process.
2. Add to protocol contract to support explicit lifecycle status and
stream ordering:
- `WriteResponse` now returns `WriteStatus` (Accepted, UnknownProcess,
StdinClosed, Starting) instead of a bool.
- Added seq fields to output/exited notifications.
- Added terminal process/closed notification.
3. Demultiplexed remote notifications into per-process channels. Same as
for the event sys
4. Local and remote backends now both implement ExecBackend.
5. Local backend wraps internal process ID/operations into per-process
ExecProcess objects.
6. Remote backend registers a session channel before launch and
unregisters on failed launch.
## What changed in `unified_exec`
1. Added unified process-state model and backend-neutral process
wrapper. This will probably disappear in the future, but it makes it
easier to keep the work flowing on both side.
- `UnifiedExecProcess` now handles both local PTY sessions and remote
exec-server processes through a shared `ProcessHandle`.
- Added `ProcessState` to track has_exited, exit_code, and terminal
failure message consistently across backends.
2. Routed write and lifecycle handling through process-level methods.
## Some rationals
1. The change centralizes execution transport in exec-server while
preserving policy and orchestration ownership in core, avoiding
duplicated launch approval logic. This comes from internal discussion.
2. Session-scoped events remove coupling/cross-talk between processes
and make stream ordering and terminal state explicit (seq, closed,
failed).
3. The failure-path surfacing (remote launch failures, write failures,
transport disconnects) makes command tool output and cleanup behavior
deterministic
## Follow-ups:
* Unify the concept of thread ID behind an obfuscated struct
* FD handling
* Full zsh-fork compatibility
* Full network sandboxing compatibility
* Handle ws disconnection
Fixes#15283.
## Summary
Older system bubblewrap builds reject `--argv0`, which makes our Linux
sandbox fail before the helper can re-exec. This PR keeps using system
`/usr/bin/bwrap` whenever it exists and only falls back to vendored
bwrap when the system binary is missing. That matters on stricter
AppArmor hosts, where the distro bwrap package also provides the policy
setup needed for user namespaces.
For old system bwrap, we avoid `--argv0` instead of switching binaries:
- pass the sandbox helper a full-path `argv0`,
- keep the existing `current_exe() + --argv0` path when the selected
launcher supports it,
- otherwise omit `--argv0` and re-exec through the helper's own
`argv[0]` path, whose basename still dispatches as
`codex-linux-sandbox`.
Also updates the launcher/warning tests and docs so they match the new
behavior: present-but-old system bwrap uses the compatibility path, and
only absent system bwrap falls back to vendored.
### Validation
1. Install Ubuntu 20.04 in a VM
2. Compile codex and run without bubblewrap installed - see a warning
about falling back to the vendored bwrap
3. Install bwrap and verify version is 0.4.0 without `argv0` support
4. run codex and use apply_patch tool without errors
<img width="802" height="631" alt="Screenshot 2026-03-25 at 11 48 36 PM"
src="https://github.com/user-attachments/assets/77248a29-aa38-4d7c-9833-496ec6a458b8"
/>
<img width="807" height="634" alt="Screenshot 2026-03-25 at 11 47 32 PM"
src="https://github.com/user-attachments/assets/5af8b850-a466-489b-95a6-455b76b5050f"
/>
<img width="812" height="635" alt="Screenshot 2026-03-25 at 11 45 45 PM"
src="https://github.com/user-attachments/assets/438074f0-8435-4274-a667-332efdd5cb57"
/>
<img width="801" height="623" alt="Screenshot 2026-03-25 at 11 43 56 PM"
src="https://github.com/user-attachments/assets/0dc8d3f5-e8cf-4218-b4b4-a4f7d9bf02e3"
/>
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
For app-server websocket auth, support the two server-side mechanisms
from
PR #14847:
- `--ws-auth capability-token --ws-token-file /abs/path`
- `--ws-auth signed-bearer-token --ws-shared-secret-file /abs/path`
with optional `--ws-issuer`, `--ws-audience`, and
`--ws-max-clock-skew-seconds`
On the client side, add interactive remote support via:
- `--remote ws://host:port` or `--remote wss://host:port`
- `--remote-auth-token-env <ENV_VAR>`
Codex reads the bearer token from the named environment variable and
sends it
as `Authorization: Bearer <token>` during the websocket handshake.
Remote auth
tokens are only allowed for `wss://` URLs or loopback `ws://` URLs.
Testing:
- tested both auth methods manually to confirm connection success and
rejection for both auth types
When `tui_app_server` is enabled, shell commands in the transcript
render as fully quoted invocations like `/bin/zsh -lc "..."`. The
non-app-server TUI correctly shows the parsed command body.
Root cause:
The app-server stores `ThreadItem::CommandExecution.command` as a
shell-quoted string. When `tui_app_server` bridges that item back into
the exec renderer, it was passing `vec![command]` unchanged instead of
splitting the string back into argv. That prevented
`strip_bash_lc_and_escape()` from recognizing the shell wrapper, so the
renderer displayed the wrapper literally.
Solution:
Add a shared command-string splitter that round-trips shell-quoted
commands back into argv when it is safe to do so, while preserving
non-roundtrippable inputs as a single string. Use that helper everywhere
`tui_app_server` reconstructs exec commands from app-server payloads,
including live command-execution items, replayed thread items, and exec
approval requests. This restores the same command display behavior as
the direct TUI path without breaking Windows-style commands that cannot
be safely round-tripped.
CHAINED PR - note that base is eternal/hooks-pretooluse-bash, not main
-- so the following PR should be first
Matching post-tool hook to the pre-tool functionality here:
https://github.com/openai/codex/pull/15211
So, PreToolUse calls for plain shell calls, allows blocking. This
PostToolUse call runs after the command executed
example run:
```
› as a test, run in parallel the following commands:
- echo 'one'
- echo '[block-pre-tool-use]'
- echo '[block-post-tool-use]'
⚠ MCP startup incomplete (failed: notion, linear)
• Cruising through those three commands in parallel now, and I’ll share the exact outputs right after
they land.
• Running PreToolUse hook: checking the observatory runes
• Running PreToolUse hook: checking the observatory runes
• Running PreToolUse hook: checking the observatory runes
PreToolUse hook (blocked)
warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
PreToolUse hook (completed)
warning: wizard-tower PreToolUse demo inspected Bash: echo 'one'
PreToolUse hook (completed)
warning: wizard-tower PreToolUse demo inspected Bash: echo '[block-post-tool-use]'
• Ran echo '[block-post-tool-use]'
└ [block-post-tool-use]
• Ran echo 'one'
└ one
• Running PostToolUse hook: reading the observatory aftermath
• Running PostToolUse hook: reading the observatory aftermath
PostToolUse hook (blocked)
warning: wizard-tower PostToolUse demo blocked the Bash follow-through on purpose.
feedback: PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to
continue.
PostToolUse hook (completed)
warning: wizard-tower PostToolUse demo inspected Bash: echo 'one'
hook context: PostToolUse demo saw the Bash result. Response preview: one
──────────────────────────────────────────────────────────────────────────────────────────────────────
• Ran them in parallel. Little wave report:
- echo 'one' completed and returned one.
- echo '[block-pre-tool-use]' was blocked before execution by the PreToolUse hook.
- echo '[block-post-tool-use]' ran, but its result was blocked afterward by the PostToolUse hook.
The hook messages were:
- PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
- PostToolUse demo blocked the result after execution. Remove [block-post-tool-use] to continue.
› for the blocked messages (correctly blocked in this test) did the tool output land back into
context?
• Yeah, from this run:
- PreToolUse block: the command itself did not execute, so no command output landed in context. Only
the block message did.
- PostToolUse block: the command did execute, but the actual stdout did not come back into context.
Only the post-tool block message did.
So the blocked notifications landed in context, but the blocked command payloads themselves did not.
Nice clean guardrail, aloha.
```
Add environment manager that is a singleton and is created early in
app-server (before skill manager, before config loading).
Use an environment variable to point to a running exec server.
## Summary
- add an explicit `mcp.tools.call` span around MCP tool execution in
core
- keep MCP span validation local to `mcp_tool_call_tests` instead of
broadening the integration test suite
- inline the turn/session correlation fields directly in the span
initializer
## Included Changes
- `codex-rs/core/src/mcp_tool_call.rs`: wrap the existing MCP tool call
in `mcp.tools.call` and inline `conversation.id`, `session.id`, and
`turn.id` in the span initializer
- `codex-rs/core/src/mcp_tool_call_tests.rs`: assert the MCP span
records the expected correlation and server fields
## Testing
- `cargo test -p codex-core`
- `just fmt`
## Notes
- `cargo test -p codex-core` still hits existing unrelated failures in
guardian-config tests and the sandboxed JS REPL `mktemp` test
- metric work moved to stacked PR #15792
- transport-level RMCP spans and trace propagation remain in stacked PR
#15792
- full workspace `cargo test` was not run
---------
Co-authored-by: Codex <noreply@openai.com>
I've seen several intermittent failures of
`get_auth_status_returns_token_after_proactive_refresh_recovery` today.
I investigated, and I found a couple of issues.
First, `getAuthStatus(refreshToken=true)` could refresh twice in one
request: once via `refresh_token_if_requested()` and again via the
proactive refresh path inside `auth_manager.auth()`. In the
permanent-failure case this produced an extra `/oauth/token` call and
made the app-server auth tests flaky. Use `auth_cached()` after an
explicit refresh request so the handler reuses the post-refresh auth
state instead of immediately re-entering proactive refresh logic. Keep
the existing proactive path for `refreshToken=false`.
Second, serialize auth refresh attempts in `AuthManager` have a
startup/request race. One proactive refresh could already be in flight
while a `getAuthStatus(refreshToken=false)` request entered
`auth().await`, causing a second `/oauth/token` call before the first
failure or refresh result had been recorded. Guarding the refresh flow
with a single async lock makes concurrent callers share one refresh
result, which prevents duplicate refreshes and stabilizes the
proactive-refresh auth tests.
## Summary
- move skill loading and management into codex-core-skills
- leave codex-core with the thin integration layer and shared wiring
## Testing
- CI
---------
Co-authored-by: Codex <noreply@openai.com>
## TL;DR
When running codex with `-c features.tui_app_server=true` we see
corruption when streaming large amounts of data. This PR marks other
event types as _critical_ by making them _must-deliver_.
## Problem
When the TUI consumer falls behind the app-server event stream, the
bounded `mpsc` channel fills up and the forwarding layer drops events
via `try_send`. Previously only `TurnCompleted` was marked as
must-deliver. Streamed assistant text (`AgentMessageDelta`) and the
authoritative final item (`ItemCompleted`) were treated as droppable —
the same as ephemeral command output deltas. Because the TUI renders
markdown incrementally from these deltas, dropping any of them produces
permanently corrupted or incomplete paragraphs that persist for the rest
of the session.
## Mental model
The app-server event stream has two tiers of importance:
1. **Lossless (transcript + terminal):** Events that form the
authoritative record of what the assistant said or that signal turn
lifecycle transitions. Losing any of these corrupts the visible output
or leaves surfaces waiting forever. These are: `AgentMessageDelta`,
`PlanDelta`, `ReasoningSummaryTextDelta`, `ReasoningTextDelta`,
`ItemCompleted`, and `TurnCompleted`.
2. **Best-effort (everything else):** Ephemeral status events like
`CommandExecutionOutputDelta` and progress notifications. Dropping these
under load causes cosmetic gaps but no permanent corruption.
The forwarding layer uses `try_send` for best-effort events (dropping on
backpressure) and blocking `send().await` for lossless events (applying
back-pressure to the producer until the consumer catches up).
## Non-goals
- Eliminating backpressure entirely. The bounded queue is intentional;
this change only widens the set of events that survive it.
- Changing the event protocol or adding new notification types.
- Addressing root causes of consumer slowness (e.g. TUI render cost).
## Tradeoffs
Blocking on transcript events means a slow consumer can now stall the
producer for the duration of those events. This is acceptable because:
(a) the alternative is permanently broken output, which is worse; (b)
the consumer already had to keep up with `TurnCompleted` blocking sends;
and (c) transcript events arrive at model-output speed, not burst speed,
so sustained saturation is unlikely in practice.
## Architecture
Two parallel changes, one per transport:
- **In-process path** (`lib.rs`): The inline forwarding logic was
extracted into `forward_in_process_event`, a standalone async function
that encapsulates the lag-marker / must-deliver / try-send decision
tree. The worker loop now delegates to it. A new
`server_notification_requires_delivery` function (shared `pub(crate)`)
centralizes the notification classification.
- **Remote path** (`remote.rs`): The local `event_requires_delivery` now
delegates to the same shared `server_notification_requires_delivery`,
keeping both transports in sync.
## Observability
No new metrics or log lines. The existing `warn!` on event drops
continues to fire for best-effort events. Lossless events that block
will not produce a log line (they simply wait).
## Tests
- `event_requires_delivery_marks_transcript_and_terminal_events`: unit
test confirming the expanded classification covers `AgentMessageDelta`,
`ItemCompleted`, `TurnCompleted`, and excludes
`CommandExecutionOutputDelta` and `Lagged`.
-
`forward_in_process_event_preserves_transcript_notifications_under_backpressure`:
integration-style test that fills a capacity-1 channel, verifies a
best-effort event is dropped (skipped count increments), then sends
lossless transcript events and confirms they all arrive in order with
the correct lag marker preceding them.
- `remote_backpressure_preserves_transcript_notifications`: end-to-end
test over a real websocket that verifies the remote transport preserves
transcript events under the same backpressure scenario.
- `event_requires_delivery_marks_transcript_and_disconnect_events`
(remote): unit test confirming the remote-side classification covers
transcript events and `Disconnected`.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## Summary
This change adds websocket authentication at the app-server transport
boundary and enforces it before JSON-RPC `initialize`, so authenticated
deployments reject unauthenticated clients during the websocket
handshake rather than after a connection has already been admitted.
During rollout, websocket auth is opt-in for non-loopback listeners so
we do not break existing remote clients. If `--ws-auth ...` is
configured, the server enforces auth during websocket upgrade. If auth
is not configured, non-loopback listeners still start, but app-server
logs a warning and the startup banner calls out that auth should be
configured before real remote use.
The server supports two auth modes: a file-backed capability token, and
a standard HMAC-signed JWT/JWS bearer token verified with the
`jsonwebtoken` crate, with optional issuer, audience, and clock-skew
validation. Capability tokens are normalized, hashed, and compared in
constant time. Short shared secrets for signed bearer tokens are
rejected at startup. Requests carrying an `Origin` header are rejected
with `403` by transport middleware, and authenticated clients present
credentials as `Authorization: Bearer <token>` during websocket upgrade.
## Validation
- `cargo test -p codex-app-server transport::auth`
- `cargo test -p codex-cli app_server_`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `just bazel-lock-check`
Note: in the broad `cargo test -p codex-app-server
connection_handling_websocket` run, the touched websocket auth cases
passed, but unrelated Unix shutdown tests failed with a timeout in this
environment.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## TL;DR
This PR changes the `tui_app_server` _path_ in the following ways:
- add missing feature to show agent names (shows only UUIDs today)
- add `Cmd/Alt+Arrows` navigation between agent conversations
## Problem
When the TUI connects to a remote app server, collab agent tool-call
items (spawn, wait, delegate, etc.) render thread UUIDs instead of
human-readable agent names because the `ChatWidget` never receives
nickname/role metadata for receiver threads. Separately, keyboard
next/previous agent navigation silently does nothing when the local
`AgentNavigationState` cache has not yet been populated with subagent
threads that the remote server already knows about.
Both issues share a root cause: in the remote (app-server) code path the
TUI never proactively fetches thread metadata. In the local code path
this metadata arrives naturally via spawn events the TUI itself
orchestrates, but in the remote path those events were processed by a
different client and the TUI only sees the resulting collab tool-call
notifications.
## Mental model
Collab agent tool-call notifications reference receiver threads by id,
but carry no nickname or role. The TUI needs that metadata in two
places:
1. **Rendering** -- `ChatWidget` converts `CollabAgentToolCall` items
into history cells. Without metadata, agent status lines show raw UUIDs.
2. **Navigation** -- `AgentNavigationState` tracks known threads for the
`/agent` picker and keyboard cycling. Without entries for remote
subagents, next/previous has nowhere to go.
This change closes the gap with two complementary strategies:
- **Eager hydration**: when any notification carries
`receiver_thread_ids`, the TUI fetches metadata (`thread/read`) for
threads it has not yet cached before the notification is rendered.
- **Backfill on thread switch**: when the user resumes, forks, or starts
a new app-server thread, the TUI fetches the full `thread/loaded/list`,
walks the parent-child spawn tree, and registers every descendant
subagent in both the navigation cache and the `ChatWidget` metadata map.
A new `collab_agent_metadata` side-table in `ChatWidget` stores
nickname/role keyed by `ThreadId`, kept in sync by `App` whenever it
calls `upsert_agent_picker_thread`. The `replace_chat_widget` helper
re-seeds this map from `AgentNavigationState` so that thread switches
(which reconstruct the widget) do not lose previously discovered
metadata.
## Non-goals
- This change does not alter the local (non-app-server) collab code
path. That path already receives metadata via spawn events and is
unaffected.
- No new protocol messages are introduced. The change uses existing
`thread/read` and `thread/loaded/list` RPCs.
- No changes to how `AgentNavigationState` orders or cycles through
threads. The traversal logic is unchanged; only the population of
entries is extended.
## Tradeoffs
- **Extra RPCs on notification path**:
`hydrate_collab_agent_metadata_for_notification` issues a `thread/read`
for each unknown receiver thread before the notification is forwarded to
rendering. This adds latency on the notification path but only fires
once per thread (the result is cached). The alternative -- rendering
first and backfilling names later -- would cause visible flicker as
UUIDs are replaced with names.
- **Backfill fetches all loaded threads**:
`backfill_loaded_subagent_threads` fetches the full loaded-thread list
and walks the spawn tree even when the user may only care about one
subagent. This is simple and correct but O(loaded_threads) per thread
switch. For typical session sizes this is negligible; it could become a
concern for sessions with hundreds of subagents.
- **Metadata duplication**: agent nickname/role is now stored in both
`AgentNavigationState` (for picker/label) and
`ChatWidget::collab_agent_metadata` (for rendering). The two are kept in
sync through `upsert_agent_picker_thread` and `replace_chat_widget`, but
there is no compile-time enforcement of this coupling.
## Architecture
### New module: `app::loaded_threads`
Pure function `find_loaded_subagent_threads_for_primary` that takes a
flat list of `Thread` objects and a primary thread id, then walks the
`SessionSource::SubAgent` parent-child edges to collect all transitive
descendants. Returns a sorted vec of `LoadedSubagentThread` (thread_id +
nickname + role). No async, no side effects -- designed for unit
testing.
### New methods on `App`
| Method | Purpose |
|--------|---------|
| `collab_receiver_thread_ids` | Extracts `receiver_thread_ids` from
`ItemStarted` / `ItemCompleted` collab notifications |
| `hydrate_collab_agent_metadata_for_notification` | Fetches and caches
metadata for unknown receiver threads before a notification is rendered
|
| `backfill_loaded_subagent_threads` | Bulk-fetches all loaded threads
and registers descendants of the primary thread |
| `adjacent_thread_id_with_backfill` | Attempts navigation, falls back
to backfill if the cache has no adjacent entry |
| `replace_chat_widget` | Replaces the widget and re-seeds its metadata
map from `AgentNavigationState` |
### New state in `ChatWidget`
`collab_agent_metadata: HashMap<ThreadId, CollabAgentMetadata>` -- a
lookup table that rendering functions consult to attach human-readable
names to collab tool-call items. Populated externally by `App` via
`set_collab_agent_metadata`.
### New method on `AppServerSession`
`thread_loaded_list` -- thin wrapper around
`ClientRequest::ThreadLoadedList`.
## Observability
- `tracing::warn` on invalid thread ids during hydration and backfill.
- `tracing::warn` on failed `thread/read` or `thread/loaded/list` RPCs
(with thread id and error).
- No new metrics or feature flags.
## Tests
-
**`loaded_threads::tests::finds_loaded_subagent_tree_for_primary_thread`**
-- unit test for the spawn-tree walk: verifies child and grandchild are
included, unrelated threads are excluded, and metadata is carried
through.
-
**`app::tests::replace_chat_widget_reseeds_collab_agent_metadata_for_replay`**
-- integration test that creates a `ChatWidget`, replaces it via
`replace_chat_widget`, replays a collab wait notification, and asserts
the rendered history cell contains the agent name rather than a UUID.
- **Updated snapshot** `app_server_collab_wait_items_render_history` --
the existing collab wait rendering test now sets metadata before sending
notifications, so the snapshot shows `Robie [explorer]` / `Ada
[reviewer]` instead of raw thread ids.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## Summary
Add the follow up code comment Michael asked for at the MDM
`managed_config_from_mdm` - a follow up from
https://github.com/openai/codex/pull/15351.
## Validation
1. `cargo fmt --all --check`
2. `cargo test -p codex-core
managed_preferences_expand_home_directory_in_workspace_write_roots --
--nocapture`
3. `cargo test -p codex-core
write_value_succeeds_when_managed_preferences_expand_home_directory_paths
-- --nocapture`
4. `./tools/argument-comment-lint/run-prebuilt-linter.sh -p codex-core`
## Summary
- move the analytics events client into codex-analytics
- update codex-core and app-server callsites to use the new crate
## Testing
- CI
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- extract plugin identifiers and load-outcome types into codex-plugin
- update codex-core to consume the new plugin crate
## Testing
- CI
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add `codex resume --include-non-interactive` to include
non-interactive sessions in the picker and `--last`
- keep current-provider and cwd filtering behavior unchanged
- replace the picker API boolean with a `SessionSourceFilter` enum to
avoid a boolean trap
## Tests
- `cargo test -p codex-cli`
- `cargo test -p codex-tui`
- `just fmt`
- `just fix -p codex-cli`
- `just fix -p codex-tui`
## Summary
- extract instruction fragment and user-instruction types into
codex-instructions
- update codex-core to consume the new crate
## Testing
- CI
---------
Co-authored-by: Codex <noreply@openai.com>
## TL;DR
Fix duplicated reasoning summaries in `tui_app_server`.
<img width="1716" height="912" alt="image"
src="https://github.com/user-attachments/assets/6362f25a-ab1c-4a01-bf10-b5616c9428c2"
/>
During live turns, reasoning text is already rendered incrementally from
`ReasoningSummaryTextDelta`. When the same reasoning item later arrives
via `ItemCompleted`, we should only finalize the reasoning block, not
render the same summary again.
## What changed
- only replay rendered reasoning summaries from completed
`ThreadItem::Reasoning` items
- kept live completed reasoning items as finalize-only
- added a regression test covering the live streaming + completion path
## Why
Without this, the first reasoning summary often appears twice in the
transcript when `model_reasoning_summary = "detailed"` and
`features.tui_app_server = true`.
Migrate `cwd` and related session/config state to `AbsolutePathBuf` so
downstream consumers consistently see absolute working directories.
Add test-only `.abs()` helpers for `Path`, `PathBuf`, and `TempDir`, and
update branch-local tests to use them instead of
`AbsolutePathBuf::try_from(...)`.
For the remaining TUI/app-server snapshot coverage that renders absolute
cwd values, keep the snapshots unchanged and skip the Windows-only cases
where the platform-specific absolute path layout differs.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
---------
Co-authored-by: Codex <noreply@openai.com>
- Changed `requires_mcp_tool_approval` to apply MCP spec defaults when
annotations are missing.
- Unannotated tools now default to:
- `readOnlyHint = false`
- `destructiveHint = true`
- `openWorldHint = true`
- This means unannotated MCP tools now go through approval/ARC
monitoring instead of silently bypassing it.
- Explicitly read-only tools still skip approval unless they are also
explicitly marked destructive.
**Previous behavior**
Failed open for missing annotations, which was unsafe for custom MCP
tools that omitted or forgot annotations.
---------
Co-authored-by: colby-oai <228809017+colby-oai@users.noreply.github.com>
This PR adds code to recover from a narrow app-server timing race where
a follow-up can be sent after the previous turn has already ended but
before the TUI has observed that completion.
Instead of surfacing turn/steer failed: no active turn to steer, the
client now treats that as a stale active-turn cache and falls back to
starting a fresh turn, matching the intended submit behavior more
closely. This is similar to the strategy employed by other app server
clients (notably, the IDE extension and desktop app).
This race exists because the current app-server API makes the client
choose between two separate RPCs, turn/steer and turn/start, based on
its local view of whether a turn is still active. That view is
replicated from asynchronous notifications, so it can be stale for a
brief window. The server may already have ended the turn while the
client still believes it is in progress. Since the choice is made
client-side rather than atomically on the server, tui_app_server can
occasionally send turn/steer for a turn that no longer exists.
## Summary
- keep legacy Windows restricted-token sandboxing as the supported
baseline
- support the split-policy subset that restricted-token can enforce
directly today
- support full-disk read, the same writable root set as legacy
`WorkspaceWrite`, and extra read-only carveouts under those writable
roots via additional deny-write ACLs
- continue to fail closed for unsupported split-only shapes, including
explicit unreadable (`none`) carveouts, reopened writable descendants
under read-only carveouts, and writable root sets that do not match the
legacy workspace roots
## Example
Given a filesystem policy like:
```toml
":root" = "read"
":cwd" = "write"
"./docs" = "read"
```
the restricted-token backend can keep the workspace writable while
denying writes under `docs` by layering an extra deny-write carveout on
top of the legacy workspace-write roots.
A policy like:
```toml
"/workspace" = "write"
"/workspace/docs" = "read"
"/workspace/docs/tmp" = "write"
```
still fails closed, because the unelevated backend cannot reopen the
nested writable descendant safely.
## Stack
-> fix: support split carveouts in windows restricted-token sandbox
#14172
fix: support split carveouts in windows elevated sandbox #14568
TL;DR: update the quickstart integration assertion to match the current
example output.
- replace the stale `Status:` expectation for
`01_quickstart_constructor` with `Server:`, `Items:`, and `Text:`
- keep the existing guard against `Server: unknown`
## Summary
- remove the fork-startup `build_initial_context` injection
- keep the reconstructed `reference_context_item` as the fork baseline
until the first real turn
- update fork-history tests and the request snapshot, and add a
`TODO(ccunningham)` for remaining nondiffable initial-context inputs
## Why
Fork startup was appending current-session initial context immediately
after reconstructing the parent rollout, then the first real turn could
emit context updates again. That duplicated model-visible context in the
child rollout.
## Impact
Forked sessions now behave like resume for context seeding: startup
reconstructs history and preserves the prior baseline, and the first
real turn handles any current-session context emission.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Reuse the existing config path resolver for the macOS MDM managed
preferences layer so `writable_roots = ["~/code"]` expands the same way
as file-backed config
- keep the change scoped to the MDM branch in `config_loader`; the
current net diff is only `config_loader/mod.rs` plus focused regression
tests in `config_loader/tests.rs` and `config/service_tests.rs`
- research note: `resolve_relative_paths_in_config_toml(...)` is already
used in several existing configuration paths, including [CLI
overrides](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L152-L163)),
[file-backed managed
config](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L274-L285)),
[normal config-file
loading](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L311-L331)),
[project `.codex/config.toml`
loading](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L863-L865)),
and [role config
loading](74fda242d3/codex-rs/core/src/agent/role.rs (L105-L109))
## Validation
- `cargo fmt --all --check`
- `cargo test -p codex-core
managed_preferences_expand_home_directory_in_workspace_write_roots --
--nocapture`
- `cargo test -p codex-core
write_value_succeeds_when_managed_preferences_expand_home_directory_paths
-- --nocapture`
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
Co-authored-by: Michael Bolin <bolinfest@gmail.com>
- Removes provenance filtering in the mentions feature for apps and
skills that were installed as part of a plugin.
- All skills and apps for a plugin are mentionable with this change.
## Why
This is a follow-up to #15360. That change fixed the `arg0` helper
setup, but `rmcp-client` still coerced stdio transport environment
values into UTF-8 `String`s before program resolution and process spawn.
If `PATH` or another inherited environment value contains non-UTF-8
bytes, that loses fidelity before it reaches `which` and `Command`.
## What changed
- change `create_env_for_mcp_server()` to return `HashMap<OsString,
OsString>` and read inherited values with `std::env::var_os()`
- change `TransportRecipe::Stdio.env`, `RmcpClient::new_stdio_client()`,
and `program_resolver::resolve()` to keep stdio transport env values in
`OsString` form within `rmcp-client`
- keep the `codex-core` config boundary stringly, but convert configured
stdio env values to `OsString` once when constructing the transport
- update the rmcp-client stdio test fixtures and callers to use
`OsString` env maps
- add a Unix regression test that verifies `create_env_for_mcp_server()`
preserves a non-UTF-8 `PATH`
## How to verify
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-core mcp_connection_manager`
- `just argument-comment-lint`
Targeted coverage in this change includes
`utils::tests::create_env_preserves_path_when_it_is_not_utf8`, while the
updated stdio transport path is exercised by the existing rmcp-client
tests that construct `RmcpClient::new_stdio_client()`.
### Summary
Add the v2 app-server filesystem watch RPCs and notifications, wire them
through the message processor, and implement connection-scoped watches
with notify-backed change delivery. This also updates the schema
fixtures, app-server documentation, and the v2 integration coverage for
watch and unwatch behavior.
This allows clients to efficiently watch for filesystem updates, e.g. to
react on branch changes.
### Testing
- exercise watch lifecycles for directory changes, atomic file
replacement, missing-file targets, and unwatch cleanup
- move the shared byte-based middle truncation logic from `core` into
`codex-utils-string`
- keep token-specific truncation in `codex-core` so rollout can reuse
the shared helper in the next stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- drop `sandbox_permissions` from the sandboxing `ExecOptions` and
`ExecRequest` adapter types
- remove the now-unused plumbing from shell, unified exec, JS REPL, and
apply-patch runtime call sites
- default reconstructed `ExecParams` to `SandboxPermissions::UseDefault`
where the lower-level API still requires the field
## Testing
- `just fmt`
- `just argument-comment-lint`
- `cargo test -p codex-core` (still running locally; first failures
observed in `suite::cli_stream::responses_mode_stream_cli`,
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`,
and
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_env_fallback`)
Switch plugin-install background MCP OAuth to a silent login path so the
raw authorization URL is no longer printed in normal success cases.
OAuth behavior is otherwise unchanged, with fallback URL output via
stderr still shown only if browser launch fails.
Before:
https://github.com/user-attachments/assets/4bf387af-afa8-4b83-bcd6-4ca6b55da8db
## Summary
Fixes slow `Ctrl+C` exit from the ChatGPT browser-login screen in
`tui_app_server`.
## Root cause
Onboarding-level `Ctrl+C` quit bypassed the auth widget's cancel path.
That let the active ChatGPT login keep running, and in-process
app-server shutdown then waited on the stale login attempt before
finishing.
## Changes
- Extract a shared `cancel_active_attempt()` path in the auth widget
- Use that path from onboarding-level `Ctrl+C` before exiting the TUI
- Add focused tests for canceling browser-login and device-code attempts
- Add app-server shutdown cleanup that explicitly drops any active login
before draining background work
## Summary
Fixes ChatGPT login in `tui_app_server` so the local browser opens again
during in-process login flows.
## Root cause
The app-server backend intentionally starts ChatGPT login with browser
auto-open disabled, expecting the TUI client to open the returned
`auth_url`. The app-server TUI was not doing that, so the login URL was
shown in the UI but no browser window opened.
## Changes
- Add a helper that opens the returned ChatGPT login URL locally
- Call it from the main ChatGPT login flow
- Call it from the device-code fallback-to-browser path as well
- Limit auto-open to in-process app-server handles so remote sessions do
not try to open a browser against a remote localhost callback
## Summary
Fixes early TUI exit paths that could leave the terminal in a dirty
state and cause a stray `%` prompt marker after the app quit.
## Root cause
Both `tui` and `tui_app_server` had early returns after `tui::init()`
that did not guarantee terminal restore. When that happened, shells like
`zsh` inherited the altered terminal state.
## Changes
- Add a restore guard around `run_ratatui_app()` in both `tui` and
`tui_app_server`
- Route early exits through the guard instead of relying on scattered
manual restore calls
- Ensure terminal restore still happens on normal shutdown
- Remove marketplace from left column.
- Change `Can be installed` to `Available`
- Align right-column marketplace + selected-row hint text across states.
- Changes applied to both `tui` and `tui_app_server`.
- Update related snapshots/tests.
<img width="2142" height="590" alt="image"
src="https://github.com/user-attachments/assets/6e60b783-2bea-46d4-b353-f2fd328ac4d0"
/>
- create `codex-git-utils` and move the shared git helpers into it with
file moves preserved for diff readability
- move the `GitInfo` helpers out of `core` so stacked rollout work can
depend on the shared crate without carrying its own git info module
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fixes a `tui_app_server` bootstrap failure when launching the CLI while
logged out.
## Root cause
During TUI bootstrap, `tui_app_server` fetched `account/rateLimits/read`
unconditionally and treated failures as fatal. When the user was logged
out, there was no ChatGPT account available, so that RPC failed and
aborted startup with:
```
Error: account/rateLimits/read failed during TUI bootstrap
```
## Changes
- Only fetch bootstrap rate limits when OpenAI auth is required and a
ChatGPT account is present
- Treat bootstrap rate-limit fetch failures as non-fatal and fall back
to empty snapshots
- Log the fetch failure at debug level instead of aborting startup
## Why
`shell-tool-mcp` and the Bash fork are no longer needed, but the patched
zsh fork is still relevant for shell escalation and for the
DotSlash-backed zsh-fork integration tests.
Deleting the old `shell-tool-mcp` workflow also deleted the only
pipeline that rebuilt those patched zsh binaries. This keeps the package
removal, while preserving a small release path that can be reused
whenever `codex-rs/shell-escalation/patches/zsh-exec-wrapper.patch`
changes.
## What changed
- removed the `shell-tool-mcp` workspace package, its npm
packaging/release jobs, the Bash test fixture, and the remaining
Bash-specific compatibility wiring
- deleted the old `.github/workflows/shell-tool-mcp.yml` and
`.github/workflows/shell-tool-mcp-ci.yml` workflows now that their
responsibilities have been replaced or removed
- kept the zsh patch under
`codex-rs/shell-escalation/patches/zsh-exec-wrapper.patch` and updated
the `codex-rs/shell-escalation` docs/code to describe the zsh-based flow
directly
- added `.github/workflows/rust-release-zsh.yml` to build only the three
zsh binaries that `codex-rs/app-server/tests/suite/zsh` needs today:
- `aarch64-apple-darwin` on `macos-15`
- `x86_64-unknown-linux-musl` on `ubuntu-24.04`
- `aarch64-unknown-linux-musl` on `ubuntu-24.04`
- extracted the shared zsh build/smoke-test/stage logic into
`.github/scripts/build-zsh-release-artifact.sh`, made that helper
directly executable, and now invoke it directly from the workflow so the
Linux and macOS jobs only keep the OS-specific setup in YAML
- wired those standalone `codex-zsh-*.tar.gz` assets into
`rust-release.yml` and added `.github/dotslash-zsh-config.json` so
releases also publish a `codex-zsh` DotSlash file
- updated the checked-in `codex-rs/app-server/tests/suite/zsh` fixture
comments to explain that new releases come from the standalone zsh
assets, while the checked-in fixture remains pinned to the latest
historical release until a newer zsh artifact is published
- tightened a couple of follow-on cleanups in
`codex-rs/shell-escalation`: the `ExecParams::command` comment now
describes the shell `-c`/`-lc` string more clearly, and the README now
points at the same `git.code.sf.net` zsh source URL that the workflow
uses
## Testing
- `cargo test -p codex-shell-escalation`
- `just argument-comment-lint`
- `bash -n .github/scripts/build-zsh-release-artifact.sh`
- attempted `cargo test -p codex-core`; unrelated existing failures
remain, but the touched `tools::runtimes::shell::unix_escalation::*`
coverage passed during that run
- Remove numeric prefixes for disabled rows in shared list rendering.
These numbers are shortcuts, Ex: Pressing "2" selects option `#2`.
Disabled items can not be selected, so keeping numbers on these items is
misleading.
- Apply the same behavior in both tui and tui_app_server.
- Update affected snapshots for apps/plugins loading and plugin detail
rows.
_**This is a global change.**_
Before:
<img width="1680" height="488" alt="image"
src="https://github.com/user-attachments/assets/4bcf94ad-285f-48d3-a235-a85b58ee58e2"
/>
After:
<img width="1706" height="484" alt="image"
src="https://github.com/user-attachments/assets/76bb6107-a562-42fe-ae94-29440447ca77"
/>
## Summary
- trim contiguous developer/contextual-user pre-turn updates when
rollback cuts back to a user turn
- add a focused history regression test for the trim behavior
- update the rollback request-boundary snapshots to show the fixed
non-duplicating context shape
---------
Co-authored-by: Codex <noreply@openai.com>
built from #14256. PR description from @etraut-openai:
This PR addresses a hole in [PR
11802](https://github.com/openai/codex/pull/11802). The previous PR
assumed that app server clients would respond to token refresh failures
by presenting the user with an error ("you must log in again") and then
not making further attempts to call network endpoints using the expired
token. While they do present the user with this error, they don't
prevent further attempts to call network endpoints and can repeatedly
call `getAuthStatus(refreshToken=true)` resulting in many failed calls
to the token refresh endpoint.
There are three solutions I considered here:
1. Change the getAuthStatus app server call to return a null auth if the
caller specified "refreshToken" on input and the refresh attempt fails.
This will cause clients to immediately log out the user and return them
to the log in screen. This is a really bad user experience. It's also a
breaking change in the app server contract that could break third-party
clients.
2. Augment the getAuthStatus app server call to return an additional
field that indicates the state of "token could not be refreshed". This
is a non-breaking change to the app server API, but it requires
non-trivial changes for all clients to properly handle this new field
properly.
3. Change the getAuthStatus implementation to handle the case where a
token refresh fails by marking the AuthManager's in-memory access and
refresh tokens as "poisoned" so it they are no longer used. This is the
simplest fix that requires no client changes.
I chose option 3.
Here's Codex's explanation of this change:
When an app-server client asks `getAuthStatus(refreshToken=true)`, we
may try to refresh a stale ChatGPT access token. If that refresh fails
permanently (for example `refresh_token_reused`, expired, or revoked),
the old behavior was bad in two ways:
1. We kept the in-memory auth snapshot alive as if it were still usable.
2. Later auth checks could retry refresh again and again, creating a
storm of doomed `/oauth/token` requests and repeatedly surfacing the
same failure.
This is especially painful for app-server clients because they poll auth
status and can keep driving the refresh path without any real chance of
recovery.
This change makes permanent refresh failures terminal for the current
managed auth snapshot without changing the app-server API contract.
What changed:
- `AuthManager` now poisons the current managed auth snapshot in memory
after a permanent refresh failure, keyed to the unchanged `AuthDotJson`.
- Once poisoned, later refresh attempts for that same snapshot fail fast
locally without calling the auth service again.
- The poison is cleared automatically when auth materially changes, such
as a new login, logout, or reload of different auth state from storage.
- `getAuthStatus(includeToken=true)` now omits `authToken` after a
permanent refresh failure instead of handing out the stale cached bearer
token.
This keeps the current auth method visible to clients, avoids forcing an
immediate logout flow, and stops repeated refresh attempts for
credentials that cannot recover.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
Follow up to #15357 by making proactive ChatGPT auth refresh depend on
the access token's JWT expiration instead of treating `last_refresh` age
as the primary source of truth.
* Add
`OutgoingMessageSender::send_server_notification_to_connection_and_wait`
which returns only once message is written to websocket (or failed to do
so)
* Use this mechanism to apply back pressure to stdout/stderr streams of
processes spawned by `command/exec`, to limit them to at most one
message in-memory at a time
* Use back pressure signal to also batch smaller chunks into ≈64KiB ones
This should make commands execution more robust over
high-latency/low-throughput networks
### Summary
Make `FileWatcher` a reusable core component which can be built upon.
Extract skills-related logic into a separate `SkillWatcher`.
Introduce a composable `ThrottledWatchReceiver` to throttle filesystem
events, coalescing affected paths among them.
### Testing
Updated existing unit tests.
- Prefer plugin manifest `interface.displayName` for plugin labels.
- Preserve plugin provenance when handling `list_mcp_tools` so connector
`plugin_display_names` are not clobbered.
- Add a TUI test to ensure plugin-owned app mentions are deduped
correctly.
## Summary
- replace the second-compaction test fixtures with a single ordered
`/responses` sequence
- assert against the real recorded request order instead of aggregating
per-mock captures
- realign the second-summary assertion to the first post-compaction user
turn where the summary actually appears
## Root cause
`compact_resume_after_second_compaction_preserves_history` collected
requests from multiple `mount_sse_once_match` recorders. Overlapping
matchers could record the same HTTP request more than once, so the test
indexed into a duplicated synthetic list rather than the true request
stream. That made the summary assertion depend on matcher evaluation
order and platform-specific behavior.
## Impact
- makes the flaky test deterministic by removing duplicate request
capture from the assertion path
- keeps the change scoped to the test only
## Validation
- `just fmt`
- `just argument-comment-lint`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- repeated the same targeted test 10 times
---------
Co-authored-by: Codex <noreply@openai.com>
Make the inter-agent communication start a turn
As part of this, we disable the v2 notifier to prevent some odd
behaviour where the agent restart working while you're talking to it for
example
Updates plugin ordering so installed plugins are listed first, with
alphabetical sorting applied within the installed and uninstalled
groups. The behavior is now consistent across both `tui` and
`tui_app_server`, and related tests/snapshots were updated.
Bumps
[vedantmgoyal9/winget-releaser](https://github.com/vedantmgoyal9/winget-releaser)
from 19e706d4c9121098010096f9c495a70a7518b30f to
7bd472be23763def6e16bd06cc8b1cdfab0e2fd5.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7bd472be23"><code>7bd472b</code></a>
docs: add description to inputs (<a
href="https://redirect.github.com/vedantmgoyal9/winget-releaser/issues/335">#335</a>)</li>
<li><a
href="a43926ed82"><code>a43926e</code></a>
fix: cargo command not found in <code>ubuntu-slim</code> runner (<a
href="https://redirect.github.com/vedantmgoyal9/winget-releaser/issues/334">#334</a>)</li>
<li>See full diff in <a
href="19e706d4c9...7bd472be23">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR completes the conversion of non-interactive `codex exec` to use
app server rather than directly using core events and methods.
### Summary
- move `codex-exec` off exec-owned `AuthManager` and `ThreadManager`
state
- route exec bootstrap, resume, and auth refresh through existing
app-server paths
- replace legacy `codex/event/*` decoding in exec with typed app-server
notification handling
- update human and JSONL exec output adapters to translate existing
app-server notifications only
- clean up "app server client" layer by eliminating support for legacy
notifications; this is no longer needed
- remove exposure of `authManager` and `threadManager` from "app server
client" layer
### Testing
- `exec` has pretty extensive unit and integration tests already, and
these all pass
- In addition, I asked Codex to put together a comprehensive manual set
of tests to cover all of the `codex exec` functionality (including
command-line options), and it successfully generated and ran these tests
## Problem
Today `codex-network-proxy` rejects a global `*` in
`network.allowed_domains`, so there is no static way to configure a
denylist-only posture for public hosts. Users have to enumerate broad
allowlist patterns instead.
## Approach
- Make global wildcard acceptance field-specific: `allowed_domains` can
use `*`, while `denied_domains` still rejects a global wildcard.
- Keep the existing evaluation order, so explicit denies still win first
and local/private protections still apply unless separately enabled.
- Add coverage for the denylist-only behavior and update the README to
document it.
## Validation
- `just fmt`
- `cargo test -p codex-network-proxy` (full run had one unrelated flaky
telemetry test:
`network_policy::tests::emit_block_decision_audit_event_emits_non_domain_event`;
reran in isolation and it passed)
- `cargo test -p codex-network-proxy
network_policy::tests::emit_block_decision_audit_event_emits_non_domain_event
-- --exact --nocapture`
- `just fix -p codex-network-proxy`
- `just argument-comment-lint`
Show all plugin marketplaces in the /plugins popup by removing the
`openai-curated` marketplace filter, and update plugin popup
copy/tests/snapshots to match the new behavior in both TUI codepaths.
## Summary
- move the pure sandbox policy transform helpers from `codex-core` into
`codex-sandboxing`
- move the corresponding unit tests with the extracted implementation
- update `core` and `app-server` callers to import the moved APIs
directly, without re-exports or proxy methods
## Testing
- cargo test -p codex-sandboxing
- cargo test -p codex-core sandboxing
- cargo test -p codex-app-server --lib
- just fix -p codex-sandboxing
- just fix -p codex-core
- just fix -p codex-app-server
- just fmt
- just argument-comment-lint
## Summary
- update the self-serve business usage-based limit message to direct
users to their admin for additional credits
- add a focused unit test for the self_serve_business_usage_based plan
branch
Added also:
If you are at a rate limit but you still have credits, codex cli would
tell you to switch the model. We shouldnt do this if you have credits so
fixed this.
## Test
- launched the source-built CLI and verified the updated message is
shown for the self-serve business usage-based plan

## Summary
- move macOS permission merging/intersection logic and tests from
`codex-core` into `codex-sandboxing`
- move seatbelt policy builders, permissions logic, SBPL assets, and
their tests into `codex-sandboxing`
- keep `codex-core` owning only the seatbelt spawn wrapper and switch
call sites to import the moved APIs directly
## Notes
- no re-exports added
- moved the seatbelt tests with the implementation so internal helpers
could stay private
- local verification is still finishing while this PR is open
## Summary
- add a new `codex-sandboxing` crate for sandboxing extraction work
- move the pure Linux sandbox argv builders and their unit tests out of
`codex-core`
- keep `core::landlock` as the spawn wrapper and update direct callers
to use `codex_sandboxing::landlock`
## Testing
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-core landlock`
- `cargo test -p codex-cli debug_sandbox`
- `just argument-comment-lint`
## Notes
- this is step 1 of the move plan aimed at minimizing per-PR diffs
- no re-exports or no-op proxy methods were added
## Summary
- add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can
request either a committed snapshot or an interrupted snapshot
- share the model-visible `<turn_aborted>` history marker between the
live interrupt path and interrupted forks
- update the small set of direct fork callsites to pass
`ForkSnapshotMode::Committed`
Note: this enables /btw to work similarly as Esc to interrupt (hopefully
somewhat in distribution)
---------
Co-authored-by: Codex <noreply@openai.com>
## What changed
- adds a targeted snapshot test for rollback with contextual diffs in
`codex_tests.rs`
- snapshots the exact model-visible request input before the rolled-back
turn and on the follow-up request after rollback
- shows the duplicate developer and environment context pair appearing
again before the follow-up user message
## Why
Rollback currently rewinds the reference context baseline without
rewinding the live session overrides. On the next turn, the same
contextual diff is emitted again and duplicated in the request sent to
the model.
## Impact
- makes the regression visible in a canonical snapshot test
- keeps the snapshot on the shared `context_snapshot` path without
adding new formatting helpers
- gives a direct repro for future fixes to rollback/context
reconstruction
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Adds support for approvals_reviewer to `Op::UserTurn` so we can migrate
`[CodexMessageProcessor::turn_start]` to use Op::UserTurn
## Testing
- [x] Adds quick test for the new field
Co-authored-by: Codex <noreply@openai.com>
Use `serde` to encode the inter agent communication to an assistant
message and use the decode to see if this is such a message
Note: this assume serde on small pattern is fast enough
- add `PreToolUse` hook for bash-like tool execution only at first
- block shell execution before dispatch with deny-only hook behavior
- introduces common.rs matcher framework for matching when hooks are run
example run:
```
› run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test
• Running the three echo commands in parallel now and I’ll report the output directly.
• Running PreToolUse hook: name for demo pre tool use hook
• Running PreToolUse hook: name for demo pre tool use hook
• Running PreToolUse hook: name for demo pre tool use hook
PreToolUse hook (completed)
warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo"
PreToolUse hook (blocked)
warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
PreToolUse hook (completed)
warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo"
• Ran echo "first parallel echo"
└ first parallel echo
• Ran echo "third parallel echo"
└ third parallel echo
• Three little waves went out in parallel.
1. printed first parallel echo
2. was blocked before execution because it contained the exact test string [block-pre-tool-use]
3. printed third parallel echo
There was also an unrelated macOS defaults warning around the successful commands, but the echoes
themselves worked fine. If you want, I can rerun the second one with a slightly modified string so
it passes cleanly.
```
## Summary
- route /realtime, Ctrl+C, and deleted realtime meters through the same
realtime stop path
- keep generic transcription placeholder cleanup free of realtime
shutdown side effects
## Testing
- Ran
- Relied on CI for verification; did not run local tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- capture the last guardian `EventMsg::Error` while waiting for review
completion
- reuse that error as the denial rationale when the review turn
completes without an assessment payload
- add a regression test for the `/responses` HTTP 400 path
## Testing
- `just fmt`
- `cargo test -p codex-core
guardian_review_surfaces_responses_api_errors_in_rejection_reason`
- `just argument-comment-lint -p codex-core`
## Notes
- `cargo test -p codex-core` still fails on the pre-existing unrelated
test
`tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`
in this environment (`mktemp ... Operation not permitted` while
downloading `dotslash`)
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fix a managed ChatGPT auth bug where a stale Codex process could
proactively refresh using an old in-memory refresh token even after
another process had already rotated auth on disk.
This changes the proactive `AuthManager::auth()` path to reuse the
existing guarded `refresh_token()` flow instead of calling the refresh
endpoint directly from cached auth state.
## Original Issue
Users reported repeated `codexd` log lines like:
```text
ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token)
```
In practice this showed up most often when multiple `codexd` processes
were left running. Killing the extra processes stopped the noise, which
suggested the issue was caused by stale auth state across processes
rather than invalid user credentials.
## Diagnosis
The bug was in the proactive refresh path used by `AuthManager::auth()`:
- Process A could refresh successfully, rotate refresh token `R0` to
`R1`, and persist the updated auth state plus `last_refresh` to disk.
- Process B could keep an older auth snapshot cached in memory, still
holding `R0` and the old `last_refresh`.
- Later, when Process B called `auth()`, it checked staleness from its
cached in-memory auth instead of first reloading from disk.
- Because that cached `last_refresh` was stale, Process B would
proactively call `/oauth/token` with stale refresh token `R0`.
- On failure, `auth()` logged the refresh error but kept returning the
same stale cached auth, so repeated `auth()` calls could keep retrying
with dead state.
This differed from the existing unauthorized-recovery flow, which
already did the safer thing: guarded reload from disk first, then
refresh only if the on-disk auth was unchanged.
## What Changed
- Switched proactive refresh in `AuthManager::auth()` to:
- do a pure staleness check on cached auth
- call `refresh_token()` when stale
- return the original cached auth on genuine refresh failure, preserving
existing outward behavior
- Removed the direct proactive refresh-from-cached-state path
- Added regression tests covering:
- stale cached auth with newer same-account auth already on disk
- the same scenario even when the refresh endpoint would fail if called
## Why This Fix
`refresh_token()` already contains the right cross-process safety
behavior:
- guarded reload from disk
- same-account verification
- skip-refresh when another process already changed auth
Reusing that path makes proactive refresh consistent with unauthorized
recovery and prevents stale processes from trying to refresh
already-rotated tokens.
## Testing
Test shape:
- create a fresh temp `CODEX_HOME` from `~/.codex/auth.json`
- force `last_refresh` to an old timestamp so proactive refresh is
required
- start two long-lived helper processes against the same auth file
- start `B` first so it caches stale auth and sleeps
- start `A` second so it refreshes first
- point both at a local mock `/oauth/token` server
- inspect whether `B` makes a second refresh request with the stale
in-memory token, or reloads the rotated token from disk
### Before the fix
The repro showed the bug clearly: the mock server saw two refreshes with
the same stale token, `A` rotated to a new token, and `B` still returned
the stale token instead of reloading from disk.
```text
POST /oauth/token refresh_token=rt_j6s0...
POST /oauth/token refresh_token=rt_j6s0...
B:cached_before=rt_j6s0...
B:cached_after=rt_j6s0...
B:returned=rt_j6s0...
A:cached_before=rt_j6s0...
A:cached_after=rotated-refresh-token-logged-run-v2
A:returned=rotated-refresh-token-logged-run-v2
```
### After the fix
After the fix, the mock server saw only one refresh request. `A`
refreshed once, and `B` started with the stale token but reloaded and
returned the rotated token.
```text
POST /oauth/token refresh_token=rt_j6s0...
B:cached_before=rt_j6s0...
B:cached_after=rotated-refresh-token-fix-branch
B:returned=rotated-refresh-token-fix-branch
A:cached_before=rt_j6s0...
A:cached_after=rotated-refresh-token-fix-branch
A:returned=rotated-refresh-token-fix-branch
```
This shows the new behavior: `A` refreshes once, then `B` reuses the
updated auth from disk instead of making a second refresh request with
the stale token.
Send input now sends messages as assistant message and with this format:
```
author: /root/worker_a
recipient: /root/worker_a/tester
other_recipients: []
Content: bla bla bla. Actual content. Only text for now
```
## Summary
- queue input after the user submits `/compact` until that manual
compact turn ends
- mirror the same behavior in the app-server TUI
- add regressions for input queued before compact starts and while it is
running
Co-authored-by: Codex <noreply@openai.com>
- Duplicate app mentions are now suppressed when they’re plugin-backed
with the same display name.
- Remaining connector mentions now label category as [Plugin] when
plugin metadata is present, otherwise [App].
- Mention result lists are now capped to 8 rows after filtering.
- Updates both tui and tui_app_server with the same changes.
## Why
Fixes [#15283](https://github.com/openai/codex/issues/15283), where
sandboxed tool calls fail on older distro `bubblewrap` builds because
`/usr/bin/bwrap` does not understand `--argv0`. The upstream [bubblewrap
v0.9.0 release
notes](https://github.com/containers/bubblewrap/releases/tag/v0.9.0)
explicitly call out `Add --argv0`. Flipping `use_legacy_landlock`
globally works around that compatibility bug, but it also weakens the
default Linux sandbox and breaks proxy-routed and split-policy cases
called out in review.
The follow-up Linux CI failure was in the new launcher test rather than
the launcher logic: the fake `bwrap` helper stayed open for writing, so
Linux would not exec it. This update also closes the user-visibility gap
from review by surfacing the same startup warning when `/usr/bin/bwrap`
is present but too old for `--argv0`, not only when it is missing.
## What Changed
- keep `use_legacy_landlock` default-disabled
- teach `codex-rs/linux-sandbox/src/launcher.rs` to fall back to the
vendored bubblewrap build when `/usr/bin/bwrap` does not advertise
`--argv0` support
- add launcher tests for supported, unsupported, and missing system
`bwrap`
- write the fake `bwrap` test helper to a closed temp path so the
supported-path launcher test works on Linux too
- extend the startup warning path so Codex warns when `/usr/bin/bwrap`
is missing or too old to support `--argv0`
- mirror the warning/fallback wording across
`codex-rs/linux-sandbox/README.md` and `codex-rs/core/README.md`,
including that the fallback is the vendored bubblewrap compiled into the
binary
- cite the upstream `bubblewrap` release that introduced `--argv0`
## Verification
- `bazel test --config=remote --platforms=//:rbe
//codex-rs/linux-sandbox:linux-sandbox-unit-tests
--test_filter=launcher::tests::prefers_system_bwrap_when_help_lists_argv0
--test_output=errors`
- `cargo test -p codex-core system_bwrap_warning`
- `cargo check -p codex-exec -p codex-tui -p codex-tui-app-server -p
codex-app-server`
- `just argument-comment-lint`
## Summary
- use Shift+Left to edit the most recent queued message when running
under tmux
- mirror the same binding change in the app-server TUI
- add tmux-specific tests and snapshot coverage for the rendered
queued-message hint
## Testing
- just fmt
- cargo test -p codex-tui
- cargo test -p codex-tui-app-server
- just argument-comment-lint -p codex-tui -p codex-tui-app-server
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add a snapshot-style core test for fork startup context injection
followed by first-turn diff injection
- capture the current duplicated startup-plus-turn context behavior
without changing runtime logic
## Testing
- not run locally; relying on CI
- just fmt
---------
Co-authored-by: Codex <noreply@openai.com>
Remove the legacy `smart_approvals` config migration from core config
loading.
This change:
- stops rewriting `smart_approvals` into `guardian_approval`
- stops backfilling `approvals_reviewer = "guardian_subagent"`
- replaces the migration tests with regression coverage that asserts the
deprecated key is ignored in root and profile scopes
Verification:
- `just fmt`
- `cargo test -p codex-core smart_approvals_alias_is_ignored`
- `cargo test -p codex-core approvals_reviewer_`
- `just argument-comment-lint`
Notes:
- `cargo test -p codex-core` still hits an unrelated existing failure in
`tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`;
the JS REPL kernel exits after `mktemp` fails under the current
environment.
Enhancement request: requested cleanup to delete the `smart_approvals`
alias migration; no public issue link is available.
Co-authored-by: Codex <noreply@openai.com>
## Summary
- remove `tui_app_server` handling for legacy app-server notifications
- drop the local ChatGPT auth refresh request path from `tui_app_server`
- remove the now-unused refresh response helper from local auth loading
Split out of #15106 so the `tui_app_server` cleanup can land separately
from the larger `codex-exec` app-server migration.
As part of moving the TUI onto the app server, we added some temporary
handling of some legacy events. We've confirmed that these do not need
to be supported, so this PR removes this support from the
tui_app_server, allowing for additional simplifications in follow-on
PRs. These events are needed only for very old rollouts. None of the
other app server clients (IDE extension or app) support these either.
## Summary
- stop translating legacy `codex/event/*` notifications inside
`tui_app_server`
- remove the TUI-side legacy warning and rollback buffering/replay paths
that were only fed by those notifications
- keep the lower-level app-server and app-server-client legacy event
plumbing intact so PR #15106 can rebase on top and handle the remaining
exec/lower-layer migration separately
Moves Code Mode to a new crate with no dependencies on codex. This
create encodes the code mode semantics that we want for lifetime,
mounting, tool calling.
The model-facing surface is mostly unchanged. `exec` still runs raw
JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
are still available through `tools.*`, and helpers like `text`, `image`,
`store`, `load`, `notify`, `yield_control`, and `exit` still exist.
The major change is underneath that surface:
- Old code mode was an external Node runtime.
- New code mode is an in-process V8 runtime embedded directly in Rust.
- Old code mode managed cells inside a long-lived Node runner process.
- New code mode manages cells in Rust, with one V8 runtime thread per
active `exec`.
- Old code mode used JSON protocol messages over child stdin/stdout plus
Node worker-thread messages.
- New code mode uses Rust channels and direct V8 callbacks/events.
This PR also fixes the two migration regressions that fell out of that
substrate change:
- `wait { terminate: true }` now waits for the V8 runtime to actually
stop before reporting termination.
- synchronous top-level `exit()` now succeeds again instead of surfacing
as a script error.
---
- `core/src/tools/code_mode/*` is now mostly an adapter layer for the
public `exec` / `wait` tools.
- `code-mode/src/service.rs` owns cell sessions and async control flow
in Rust.
- `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
JavaScript execution.
- each `exec` spawns a dedicated runtime thread plus a Rust
session-control task.
- helper globals are installed directly into the V8 context instead of
being injected through a source prelude.
- helper modules like `tools.js` and `@openai/code_mode` are synthesized
through V8 module resolution callbacks in Rust.
---
Also added a benchmark for showing the speed of init and use of a code
mode env:
```
$ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max
cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB
warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB
cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB
warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB
cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB
warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB
memory uses a fresh-process max RSS delta for each scenario
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
If we are in a mode that is already explicitly un-sandboxed, then
`ApprovalPolicy::Never` should not block dangerous commands.
## Testing
- [x] Existing unit test covers old behavior
- [x] Added a unit test for this new case
## Summary
This PR fixes restricted filesystem permission profiles so Codex's
runtime-managed helper executables remain readable without requiring
explicit user configuration.
- add implicit readable roots for the configured `zsh` helper path and
the main execve wrapper
- allowlist the shared `$CODEX_HOME/tmp/arg0` root when the execve
wrapper lives there, so session-specific helper paths keep working
- dedupe injected paths and avoid adding duplicate read entries to the
sandbox policy
- add regression coverage for restricted read mode with helper
executable overrides
## Testing
before this change: got this error when executing a shell command via
zsh fork:
```
"sandbox error: sandbox denied exec error, exit code: 127, stdout: , stderr: /etc/zprofile:11: operation not permitted: /usr/libexec/path_helper\nzsh:1: operation not permitted: .codex/skills/proxy-a/scripts/fetch_example.sh\n"
```
saw this change went away after this change, meaning the readable roots
and injected correctly.
- emit a typed `thread/realtime/transcriptUpdated` notification from
live realtime transcript deltas
- expose that notification as flat `threadId`, `role`, and `text` fields
instead of a nested transcript array
- continue forwarding raw `handoff_request` items on
`thread/realtime/itemAdded`, including the accumulated
`active_transcript`
- update app-server docs, tests, and generated protocol schema artifacts
to match the delta-based payloads
---------
Co-authored-by: Codex <noreply@openai.com>
This adds a dummy v8-poc project that in Cargo links against our
prebuilt binaries and the ones provided by rusty_v8 for non musl
platforms. This demonstrates that we can successfully link and use v8 on
all platforms that we want to target.
In bazel things are slightly more complicated. Since the libraries as
published have libc++ linked in already we end up with a lot of double
linked symbols if we try to use them in bazel land. Instead we fall back
to building rusty_v8 and v8 from source (cached of course) on the
platforms we ship to.
There is likely some compatibility drift in the windows bazel builder
that we'll need to reconcile before we can re-enable them. I'm happy to
be on the hook to unwind that.
## TL;DR
Pin the Python app-server SDK subprocess pipes to UTF-8 so Windows users
on non-UTF-8 locales do not hit `UnicodeDecodeError` when the `codex`
child emits UTF-8 text.
- add `encoding="utf-8"` to the `subprocess.Popen(...)` call in
`AppServerClient.start()`
- add a focused regression test that asserts the client launches the
subprocess with UTF-8 text I/O
- validates with `python -m pytest
sdk/python/tests/test_client_rpc_methods.py
sdk/python/tests/test_client_process_launch.py
sdk/python/tests/test_public_api_runtime_behavior.py`
Fixes#14311.
This PR add an URI-based system to reference agents within a tree. This
comes from a sync between research and engineering.
The main agent (the one manually spawned by a user) is always called
`/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
where `agent_1` is chosen by the model.
Any agent can contact any agents using the path.
Paths can be used either in absolute or relative to the calling agents
Resume is not supported for now on this new path
Fix Bazel macOS CI failures caused by the llvm module's pinned macOS SDK
URL returning 403 Forbidden from Apple's CDN.
Bump llvm to 0.6.8, switch to the new osx.from_archive(...) /
osx.frameworks(...) API, and refresh MODULE.bazel.lock so Bazel uses the
updated SDK archive configuration.
## Summary
- make app-server treat `clientInfo.name == "codex-tui"` as a legacy
compatibility case
- fall back to `DEFAULT_ORIGINATOR` instead of sending `codex-tui` as
the originator header
- add a TODO noting this is a temporary workaround that should be
removed later
## Testing
- Not run (not requested)
`CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor
"remotely" (inside a docker container) turning any integration test into
remote test.
## Summary
- add a short guardian follow-up developer reminder before reused
reviews
- cache prior-review state on the guardian session instead of rescanning
full history on each request
- update guardian follow-up coverage and snapshot expectations
---------
Co-authored-by: Codex <noreply@openai.com>
### Preliminary /plugins TUI menu
- Adds a preliminary /plugins menu flow in both tui and tui_app_server.
- Fetches plugin list data asynchronously and shows loading/error/cached
states.
- Limits this first pass to the curated ChatGPT marketplace.
- Shows available plugins with installed/status metadata.
- Supports in-menu search over plugin display name, plugin id, plugin
name, and marketplace label.
- Opens a plugin detail view on selection, including summaries for
Skills, Apps, and MCP Servers, with back navigation.
### Testing
- Launch codex-cli with plugins enabled (`--enable plugins`).
- Run /plugins and verify:
- loading state appears first
- plugin list is shown
- search filters results
- selecting a plugin opens detail view, with a list of
skills/connectors/MCP servers for the plugin
- back action returns to the list.
- Verify disabled behavior by running /plugins without plugins enabled
(shows “Plugins are disabled” message).
- Launch with `--enable tui_app_server` (and plugins enabled) and repeat
the same /plugins flow; behavior should match.
## Why
The argument-comment lint now has a packaged DotSlash artifact from
[#15198](https://github.com/openai/codex/pull/15198), so the normal repo
lint path should use that released payload instead of rebuilding the
lint from source every time.
That keeps `just clippy` and CI aligned with the shipped artifact while
preserving a separate source-build path for people actively hacking on
the lint crate.
The current alpha package also exposed two integration wrinkles that the
repo-side prebuilt wrapper needs to smooth over:
- the bundled Dylint library filename includes the host triple, for
example `@nightly-2025-09-18-aarch64-apple-darwin`, and Dylint derives
`RUSTUP_TOOLCHAIN` from that filename
- on Windows, Dylint's driver path also expects `RUSTUP_HOME` to be
present in the environment
Without those adjustments, the prebuilt CI jobs fail during `cargo
metadata` or driver setup. This change makes the checked-in prebuilt
wrapper normalize the packaged library name to the plain
`nightly-2025-09-18` channel before invoking `cargo-dylint`, and it
teaches both the wrapper and the packaged runner source to infer
`RUSTUP_HOME` from `rustup show home` when the environment does not
already provide it.
After the prebuilt Windows lint job started running successfully, it
also surfaced a handful of existing anonymous literal callsites in
`windows-sandbox-rs`. This PR now annotates those callsites so the new
cross-platform lint job is green on the current tree.
## What Changed
- checked in the current
`tools/argument-comment-lint/argument-comment-lint` DotSlash manifest
- kept `tools/argument-comment-lint/run.sh` as the source-build wrapper
for lint development
- added `tools/argument-comment-lint/run-prebuilt-linter.sh` as the
normal enforcement path, using the checked-in DotSlash package and
bundled `cargo-dylint`
- updated `just clippy` and `just argument-comment-lint` to use the
prebuilt wrapper
- split `.github/workflows/rust-ci.yml` so source-package checks live in
a dedicated `argument_comment_lint_package` job, while the released lint
runs in an `argument_comment_lint_prebuilt` matrix on Linux, macOS, and
Windows
- kept the pinned `nightly-2025-09-18` toolchain install in the prebuilt
CI matrix, since the prebuilt package still relies on rustup-provided
toolchain components
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` to
normalize host-qualified nightly library filenames, keep the `rustup`
shim directory ahead of direct toolchain `cargo` binaries, and export
`RUSTUP_HOME` when needed for Windows Dylint driver setup
- updated `tools/argument-comment-lint/src/bin/argument-comment-lint.rs`
so future published DotSlash artifacts apply the same nightly-filename
normalization and `RUSTUP_HOME` inference internally
- fixed the remaining Windows lint violations in
`codex-rs/windows-sandbox-rs` by adding the required `/*param*/`
comments at the reported callsites
- documented the checked-in DotSlash file, wrapper split, archive
layout, nightly prerequisite, and Windows `RUSTUP_HOME` requirement in
`tools/argument-comment-lint/README.md`
## Summary
- match the exec-process structure to filesystem PR #15232
- expose `ExecProcess` on `Environment`
- make `LocalProcess` the real implementation and `RemoteProcess` a thin
network proxy over `ExecServerClient`
- make `ProcessHandler` a thin RPC adapter delegating to `LocalProcess`
- add a shared local/remote process test
## Validation
- `just fmt`
- `CARGO_TARGET_DIR=~/.cache/cargo-target/codex cargo test -p
codex-exec-server`
- `just fix -p codex-exec-server`
---------
Co-authored-by: Codex <noreply@openai.com>
- Split the feature system into a new `codex-features` crate.
- Cut `codex-core` and workspace consumers over to the new config and
warning APIs.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
- Move the auth implementation and token data into codex-login.
- Keep codex-core re-exporting that surface from codex-login for
existing callers.
---------
Co-authored-by: Codex <noreply@openai.com>
Alternative approach, we use rusty_v8 for all platforms that its
predefined, but lets build from source a musl v8 version with bazel for
x86 and aarch64 only. We would need to release this on github and then
use the release.
For each feature we have:
1. Trait exposed on environment
2. **Local Implementation** of the trait
3. Remote implementation that uses the client to proxy via network
4. Handler implementation that handles PRC requests and calls into
**Local Implementation**
## Summary
Some background. We're looking to instrument GA turns end to end. Right
now a big gap is grouping mcp tool calls with their codex sessions. We
send session id and turn id headers to the responses call but not the
mcp/wham calls.
Ideally we could pass the args as headers like with responses, but given
the setup of the rmcp client, we can't send as headers without either
changing the rmcp package upstream to allow per request headers or
introducing a mutex which break concurrency. An earlier attempt made the
assumption that we had 1 client per thread, which allowed us to set
headers at the start of a turn. @pakrym mentioned that this assumption
might break in the near future.
So the solution now is to package the turn metadata/session id into the
_meta field in the post body and pull out in codex-backend.
- send turn metadata to MCP servers via `tools/call` `_meta` instead of
assuming per-thread request headers on shared clients
- preserve the existing `_codex_apps` metadata while adding
`x-codex-turn-metadata` for all MCP tool calls
- extend tests to cover both custom MCP servers and the codex apps
search flow
---------
Co-authored-by: Codex <noreply@openai.com>
updated Windows shell/unified_exec tool descriptions:
`exec_command`
```text
Runs a command in a PTY, returning output or a session ID for ongoing interaction.
Windows safety rules:
- Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
- Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
```
`shell`
```text
Runs a Powershell command (Windows) and returns its output. Arguments to `shell` will be passed to CreateProcessW(). Most commands should be prefixed with ["powershell.exe", "-Command"].
Examples of valid command strings:
- ls -a (show hidden): ["powershell.exe", "-Command", "Get-ChildItem -Force"]
- recursive find by name: ["powershell.exe", "-Command", "Get-ChildItem -Recurse -Filter *.py"]
- recursive grep: ["powershell.exe", "-Command", "Get-ChildItem -Path C:\\myrepo -Recurse | Select-String -Pattern 'TODO' -CaseSensitive"]
- ps aux | grep python: ["powershell.exe", "-Command", "Get-Process | Where-Object { $_.ProcessName -like '*python*' }"]
- setting an env var: ["powershell.exe", "-Command", "$env:FOO='bar'; echo $env:FOO"]
- running an inline Python script: ["powershell.exe", "-Command", "@'\nprint('Hello, world!')\n'@ | python -"]
Windows safety rules:
- Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
- Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
```
`shell_command`
```text
Runs a Powershell command (Windows) and returns its output.
Examples of valid command strings:
- ls -a (show hidden): "Get-ChildItem -Force"
- recursive find by name: "Get-ChildItem -Recurse -Filter *.py"
- recursive grep: "Get-ChildItem -Path C:\\myrepo -Recurse | Select-String -Pattern 'TODO' -CaseSensitive"
- ps aux | grep python: "Get-Process | Where-Object { $_.ProcessName -like '*python*' }"
- setting an env var: "$env:FOO='bar'; echo $env:FOO"
- running an inline Python script: "@'\nprint('Hello, world!')\n'@ | python -"
Windows safety rules:
- Do not compose destructive filesystem commands across shells. Do not enumerate paths in PowerShell and then pass them to `cmd /c`, batch builtins, or another shell for deletion or moving. Use one shell end-to-end, prefer native PowerShell cmdlets such as `Remove-Item` / `Move-Item` with `-LiteralPath`, and avoid string-built shell commands for file operations.
- Before any recursive delete or move on Windows, verify the resolved absolute target paths stay within the intended workspace or explicitly named target directory. Never issue a recursive delete or move against a computed path if the final target has not been checked.
```
- Move core/src/terminal.rs and its tests into a standalone
terminal-detection workspace crate.
- Update direct consumers to depend on codex-terminal-detection and
import terminal APIs directly.
---------
Co-authored-by: Codex <noreply@openai.com>
## Problem
When multiple Codex sessions are open at once, terminal tabs and windows
are hard to distinguish from each other. The existing status line only
helps once the TUI is already focused, so it does not solve the "which
tab is this?" problem.
This PR adds a first-class `/title` command so the terminal window or
tab title can carry a short, configurable summary of the current
session.
## Screenshot
<img width="849" height="320" alt="image"
src="https://github.com/user-attachments/assets/8b112927-7890-45ed-bb1e-adf2f584663d"
/>
## Mental model
`/statusline` and `/title` are separate status surfaces with different
constraints. The status line is an in-app footer that can be denser and
more detailed. The terminal title is external terminal metadata, so it
needs short, stable segments that still make multiple sessions easy to
tell apart.
The `/title` configuration is an ordered list of compact items. By
default it renders `spinner,project`, so active sessions show
lightweight progress first while idle sessions still stay easy to
disambiguate. Each configured item is omitted when its value is not
currently available rather than forcing a placeholder.
## Non-goals
This does not merge `/title` into `/statusline`, and it does not add an
arbitrary free-form title string. The feature is intentionally limited
to a small set of structured items so the title stays short and
reviewable.
This also does not attempt to restore whatever title the terminal or
shell had before Codex started. When Codex clears the title, it clears
the title Codex last wrote.
## Tradeoffs
A separate `/title` command adds some conceptual overlap with
`/statusline`, but it keeps title-specific constraints explicit instead
of forcing the status line model to cover two different surfaces.
Title refresh can happen frequently, so the implementation now shares
parsing and git-branch orchestration between the status line and title
paths, and caches the derived project-root name by cwd. That keeps the
hot path cheap without introducing background polling.
## Architecture
The TUI gets a new `/title` slash command and a dedicated picker UI for
selecting and ordering terminal-title items. The chosen ids are
persisted in `tui.terminal_title`, with `spinner` and `project` as the
default when the config is unset. `status` remains available as a
separate text item, so configurations like `spinner,status` render
compact progress like `⠋ Working`.
`ChatWidget` now refreshes both status surfaces through a shared
`refresh_status_surfaces()` path. That shared path parses configured
items once, warns on invalid ids once, synchronizes shared cached state
such as git-branch lookup, then renders the footer status line and
terminal title from the same snapshot.
Low-level OSC title writes live in `codex-rs/tui/src/terminal_title.rs`,
which owns the terminal write path and last-mile sanitization before
emitting OSC 0.
## Security
Terminal-title text is treated as untrusted display content before Codex
emits it. The write path strips control characters, removes invisible
and bidi formatting characters that can make the title visually
misleading, normalizes whitespace, and caps the emitted length.
References used while implementing this:
- [xterm control
sequences](https://invisible-island.net/xterm/ctlseqs/ctlseqs.html)
- [WezTerm escape sequences](https://wezterm.org/escape-sequences.html)
- [CWE-150: Improper Neutralization of Escape, Meta, or Control
Sequences](https://cwe.mitre.org/data/definitions/150.html)
- [CERT VU#999008 (Trojan Source)](https://kb.cert.org/vuls/id/999008)
- [Trojan Source disclosure site](https://trojansource.codes/)
- [Unicode Bidirectional Algorithm (UAX
#9)](https://www.unicode.org/reports/tr9/)
- [Unicode Security Considerations (UTR
#36)](https://www.unicode.org/reports/tr36/)
## Observability
Unknown configured title item ids are warned about once instead of
repeatedly spamming the transcript. Live preview applies immediately
while the `/title` picker is open, and cancel rolls the in-memory title
selection back to the pre-picker value.
If terminal title writes fail, the TUI emits debug logs around set and
clear attempts. The rendered status label intentionally collapses richer
internal states into compact title text such as `Starting...`, `Ready`,
`Thinking...`, `Working...`, `Waiting...`, and `Undoing...` when
`status` is configured.
## Tests
Ran:
- `just fmt`
- `cargo test -p codex-tui`
At the moment, the red Windows `rust-ci` failures are due to existing
`codex-core` `apply_patch_cli` stack-overflow tests that also reproduce
on `main`. The `/title`-specific `codex-tui` suite is green.
## Summary
- log guardian-reviewed tool approvals as `source=automated_reviewer` in
`codex.tool_decision`
- keep direct user approvals as `source=user` and config-driven
approvals as `source=config`
## Testing
-
`/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-fmt-quiet.sh`
-
`/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-test-quiet.sh
-p codex-otel` (fails in sandboxed loopback bind tests under
`otel/tests/suite/otlp_http_loopback.rs`)
- `cargo test -p codex-core guardian -- --nocapture` (original-tree run
reached Guardian tests and only hit sandbox-related listener/proxy
failures)
Co-authored-by: Codex <noreply@openai.com>
Stacked PR 2/3, based on the stub PR.
Adds the exec RPC implementation and process/event flow in exec-server
only.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
To date, the argument-comment linter introduced in
https://github.com/openai/codex/pull/14651 had to be built from source
to run, which can be a bit slow (both for local dev and when it is run
in CI). Because of the potential slowness, I did not wire it up to run
as part of `just clippy` or anything like that. As a result, I have seen
a number of occasions where folks put up PRs that violate the lint, see
it fail in CI, and then have to put up their PR again.
The goal of this PR is to pre-build a runnable version of the linter and
then make it available via a DotSlash file. Once it is available, I will
update `just clippy` and other touchpoints to make it a natural part of
the dev cycle so lint violations should get flagged _before_ putting up
a PR for review.
To get things started, we will build the DotSlash file as part of an
alpha release. Though I don't expect the linter to change often, so I'll
probably change this to only build as part of mainline releases once we
have a working DotSlash file. (Ultimately, we should probably move the
linter into its own repo so it can have its own release cycle.)
## What Changed
- add a reusable `rust-release-argument-comment-lint.yml` workflow that
builds host-specific archives for macOS arm64, Linux arm64/x64, and
Windows x64
- wire `rust-release.yml` to publish the `argument-comment-lint`
DotSlash manifest on all releases for now, including alpha tags
- package a runnable layout instead of a bare library
The Unix archive layout is:
```text
argument-comment-lint/
bin/
argument-comment-lint
cargo-dylint
lib/
libargument_comment_lint@nightly-2025-09-18-<target>.dylib|so
```
On Windows the same layout is published as a `.zip`, with `.exe` and
`.dll` filenames instead.
DotSlash resolves the package entrypoint to
`argument-comment-lint/bin/argument-comment-lint`. That runner finds the
sibling bundled `cargo-dylint` binary plus the single packaged Dylint
library under `lib/`, then invokes `cargo-dylint dylint --lib-path
<that-library>` with the repo's default lint settings.
## Summary
Persist Stop-hook continuation prompts as `user` messages instead of
hidden `developer` messages + some requested integration tests
This is a followup to @pakrym 's comment in
https://github.com/openai/codex/pull/14532 to make sure stop-block
continuation prompts match training for turn loops
- Stop continuation now writes `<hook_prompt hook_run_id="...">stop
hook's user prompt<hook_prompt>`
- Introduces quick-xml dependency, though we already indirectly depended
on it anyway via syntect
- This PR only has about 500 lines of actual logic changes, the rest is
tests/schema
## Testing
Example run (with a sessionstart hook and 3 stop hooks) - this shows
context added by session start, then two stop hooks sending their own
additional prompts in a new turn. The model responds with a single
message addressing both. Then when that turn ends, the hooks detect that
they just ran using `stop_hook_active` and decide not to infinite loop
test files for this (unzip, move codex -> .codex):
[codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)
```
› cats
• Running SessionStart hook: lighting the observatory
SessionStart hook (completed)
warning: Hi, I'm a session start hook for wizard-tower (startup).
hook context: A wimboltine stonpet is an exotic cuisine from hyperspace
• Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
cat facts, cat breeds, cat names, or build something cat-themed in this repo.
• Running Stop hook: checking the tower wards
• Running Stop hook: sacking the guards
• Running Stop hook: hiring the guards
Stop hook (completed)
warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).
Stop hook (blocked)
warning: Wizard Tower Stop hook continuing conversation
feedback: cook the stonpet
Stop hook (blocked)
warning: Wizard Tower Stop hook continuing conversation
feedback: eat the cooked stonpet
• Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
rested until the hyperspace juices settle.
Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
smoky, bright, and totally out of this dimension.
• Running Stop hook: checking the tower wards
• Running Stop hook: sacking the guards
• Running Stop hook: hiring the guards
Stop hook (completed)
warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).
Stop hook (completed)
warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
Stop hook (completed)
warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
```
The idea is that codex-exec exposes an Environment struct with services
on it. Each of those is a trait.
Depending on construction parameters passed to Environment they are
either backed by local or remote server but core doesn't see these
differences.
Add a representation of the agent graph. This is now used for:
* Cascade close agents (when I close a parent, it close the kids)
* Cascade resume (oposite)
Later, this will also be used for post-compaction stuffing of the
context
Direct fix for: https://github.com/openai/codex/issues/14458
1. Added SessionSource::Custom(String) and --session-source.
2. Enforced plugin and skill products by session_source.
3. Applied the same filtering to curated background refresh.
This PR adds a new `thread/shellCommand` app server API so clients can
implement `!` shell commands. These commands are executed within the
sandbox, and the command text and output are visible to the model.
The internal implementation mirrors the current TUI `!` behavior.
- persist shell command execution as `CommandExecution` thread items,
including source and formatted output metadata
- bridge live and replayed app-server command execution events back into
the existing `tui_app_server` exec rendering path
This PR also wires `tui_app_server` to submit `!` commands through the
new API.
## Description
Adding an extension to the spec that exposes the turn_id to hook
scripts. This is a codex-specific mechanic that allows connecting the
hook runs with particular turns
## Testing
hooks config / sample hooks to use. Extract this, rename codex ->
.codex, and place this into a repo or your home folder. It includes:
config.toml that enables hooks, hooks.json, and sample python hooks:
[codex.zip](https://github.com/user-attachments/files/26102671/codex.zip)
example run (note the turn_ids change between turns):
```
› hello
• Running SessionStart hook: lighting the observatory
SessionStart hook (completed)
warning: Hi, I'm a session start hook for wizard-tower (startup).
hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace
• Running UserPromptSubmit hook: lighting the observatory lanterns
UserPromptSubmit hook (completed)
warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd-
78878bfe34e4
hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' near the end.
• Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard-
tower, and we’ll surf the next wave together. observatory lanterns lit
• Running Stop hook: back to shore
Stop hook (completed)
warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa-
72d2-b6fd-78878bfe34e4
› what's a stonpet?
• Running UserPromptSubmit hook: lighting the observatory lanterns
UserPromptSubmit hook (completed)
warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164-
72c3-a170-98925564c4fc
hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' near the end.
• A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet
is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself.
If you want, we can totally invent the lore for it next. observatory lanterns lit
• Running Stop hook: back to shore
Stop hook (completed)
warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164-
72c3-a170-98925564c4fc
```
## Summary
- forward request-scoped task headers through MCP tool metadata lookups
and tool calls
- apply those headers to streamable HTTP initialize, tools/list, and
tools/call requests
- update affected rmcp/core tests for the new request_headers plumbing
## Testing
- cargo test -p codex-rmcp-client
- cargo test -p codex-core (fails on pre-existing unrelated error in
core/src/auth_env_telemetry.rs: missing websocket_connect_timeout_ms in
ModelProviderInfo initializer)
- just fix -p codex-rmcp-client
- just fix -p codex-core (hits the same unrelated auth_env_telemetry.rs
error)
- just fmt
---------
Co-authored-by: Codex <noreply@openai.com>
## Description
Dependent on:
- [responsesapi] https://github.com/openai/openai/pull/760991
- [codex-backend] https://github.com/openai/openai/pull/760985
`codex app-server -> codex-backend -> responsesapi` now reuses a
persistent websocket connection across many turns. This PR updates
tracing when using websockets so that each `response.create` websocket
request propagates the current tracing context, so we can get a holistic
end-to-end trace for each turn.
Tracing is propagated via special keys (`ws_request_header_traceparent`,
`ws_request_header_tracestate`) set in the `client_metadata` param in
Responses API.
Currently tracing on websockets is a bit broken because we only set
tracing context on ws connection time, so it's detached from a
`turn/start` request.
Summary
- delete the deprecated stdio transport plumbing from the exec server
stack
- add a basic `exec_server()` harness plus test utilities to start a
server, send requests, and await events
- refresh exec-server dependencies, configs, and documentation to
reflect the new flow
Testing
- Not run (not requested)
---------
Co-authored-by: starr-openai <starr@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## TL;DR
Add `thread.run(...)` / `async thread.run(...)` convenience methods to
the Python SDK for the common case.
- add `RunInput = Input | str` and `RunResult` with `final_response`,
collected `items`, and optional `usage`
- keep `thread.turn(...)` strict and lower-level for streaming,
steering, interrupting, and raw generated `Turn` access
- update Python SDK docs, quickstart examples, and tests for the sync
and async convenience flows
## Validation
- `python3 -m pytest sdk/python/tests/test_public_api_signatures.py
sdk/python/tests/test_public_api_runtime_behavior.py`
- `python3 -m pytest
sdk/python/tests/test_real_app_server_integration.py -k
'thread_run_convenience or async_thread_run_convenience'` (skipped in
this environment)
---------
Co-authored-by: Codex <noreply@openai.com>
Stacked PR 1/3.
This is the initialize-only exec-server stub slice: binary/client
scaffolding and protocol docs, without exec/filesystem implementation.
---------
Co-authored-by: Codex <noreply@openai.com>
Resubmit https://github.com/openai/codex/pull/15020 with correct
content.
1. Use requirement-resolved config.features as the plugin gate.
2. Guard plugin/list, plugin/read, and related flows behind that gate.
3. Skip bad marketplace.json files instead of failing the whole list.
4. Simplify plugin state and caching.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
This PR makes `thread/resume` reuse persisted thread model metadata when
the caller does not explicitly override it.
Changes:
- read persisted thread metadata from SQLite during `thread/resume`
- reuse persisted `model` and `model_reasoning_effort` as resume-time
defaults
- fetch persisted metadata once and reuse it later in the resume
response path
- keep thread summary loading on the existing rollout path, while
reusing persisted metadata when available
- document the resume fallback behavior in the app-server README
## Why
Before this change, resuming a thread without explicit overrides derived
`model` and `model_reasoning_effort` from current config, which could
drift from the thread’s last persisted values. That meant a resumed
thread could report and run with different model settings than the ones
it previously used.
## Behavior
Precedence on `thread/resume` is now:
1. explicit resume overrides
2. persisted SQLite metadata for the thread
3. normal config resolution for the resumed cwd
## Summary
- store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
exports keep span prefixes and structured event fields
- render SQLite feedback exports with timestamps and level prefixes to
match the old in-memory feedback formatter, while preserving existing
trailing newlines
- count `feedback_log_body` in the SQLite retention budget so structured
or span-prefixed rows still prune correctly
- bound `/feedback` row loading in SQL with the retention estimate, then
apply exact whole-line truncation in Rust so uploads stay capped without
splitting lines
## Details
- add a `feedback_log_body` column to `logs` and backfill it from
`message` for existing rows
- capture span names plus formatted span and event fields at write time,
since SQLite does not retain enough structure to reconstruct the old
formatter later
- keep SQLite feedback queries scoped to the requested thread plus
same-process threadless rows
- restore a SQL-side cumulative `estimated_bytes` cap for feedback
export queries so over-retained partitions do not load every matching
row before truncation
- add focused formatting coverage for exported feedback lines and parity
coverage against `tracing_subscriber`
## Testing
- cargo test -p codex-state
- just fix -p codex-state
- just fmt
codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`
---------
Co-authored-by: Codex <noreply@openai.com>
- prefix realtime handoff output with the agent final message label for
both realtime v1 and v2
- update realtime websocket and core expectations to match
## Summary
- detect custom prompts in `$CODEX_HOME/prompts` during TUI startup
- show a deprecation notice only when prompts are present, with guidance
to use `$skill-creator`
- add TUI tests and snapshot coverage for present, missing, and empty
prompts directories
## Testing
- Manually tested
Cleanup image semantics in code mode.
`view_image` now returns `{image_url:string, details?: string}`
`image()` now allows both string parameter and `{image_url:string,
details?: string}`
Fix the CI job by updating it to use artifacts from a more recent
release (`0.115.0`) instead of the existing one (`0.74.0`).
This step in our CI job on PRs started failing today:
334164a6f7/.github/workflows/ci.yml (L33-L47)
I believe it's because this test verifies that the "package npm" script
works, but we want it to be fast and not wait for binaries to be built,
so it uses a GitHub workflow that's already done. Because it was using a
GitHub workflow associated with `0.74.0`, it seems likely that
workflow's history has been reaped, so we need to use a newer one.
## Problem
The app-server TUI (`tui_app_server`) lacked composer history support.
Pressing Up/Down to recall previous prompts hit a stub that logged a
warning and displayed "Not available in app-server TUI yet." New
submissions were silently dropped from the shared history file, so
nothing persisted for future sessions.
## Mental model
Codex maintains a single, append-only history file
(`$CODEX_HOME/history.jsonl`) shared across all TUI processes on the
same machine. The legacy (in-process) TUI already reads/writes this file
through `codex_core::message_history`. The app-server TUI delegates most
operations to a separate process over RPC, but history is intentionally
*not* an RPC concern — it's a client-local file.
This PR makes the app-server TUI access the same history file directly,
bypassing the app-server process entirely. The composer's Up/Down
navigation and submit-time persistence now follow the same code paths as
the legacy TUI, with the only difference being *where* the call is
dispatched (locally in `App`, rather than inside `CodexThread`).
The branch is rebuilt directly on top of `upstream/main`, so it keeps
the
existing app-server restore architecture intact.
`AppServerStartedThread`
still restores transcript history from the server `Thread` snapshot via
`thread_snapshot_events`; this PR only adds composer-history support.
## Non-goals
- Adding history support to the app-server protocol. History remains
client-local.
- Changing the on-disk format or location of `history.jsonl`.
- Surfacing history I/O errors to the user (failures are logged and
silently swallowed, matching the legacy TUI).
## Tradeoffs
| Decision | Why | Risk |
|----------|-----|------|
| Widen `message_history` from `pub(crate)` to `pub` | Avoids
duplicating file I/O logic; the module already has a clean, minimal API
surface. | Other workspace crates can now call these functions — the
contract is no longer crate-private. However, this is consistent with
recent precedent: `590cfa617` exposed `mention_syntax` for TUI
consumption, `752402c4f` exposed plugin APIs (`PluginsManager`), and
`14fcb6645`/`edacbf7b6` widened internal core APIs for other crates.
These were all narrow, intentional exposures of specific APIs — not
broad "make internals public" moves. `1af2a37ad` even went the other
direction, reducing broad re-exports to tighten boundaries. This change
follows the same pattern: a small, deliberate API surface (3 functions)
rather than a wholesale visibility change. |
| Intercept `AddToHistory` / `GetHistoryEntryRequest` in `App` before
RPC fallback | Keeps history ops out of the "unsupported op" error path
without changing app-server protocol. | This now routes through a single
`submit_thread_op` entry point, which is safer than the original
duplicated dispatch. The remaining risk is organizational: future
thread-op submission paths need to keep using that shared entry point. |
| `session_configured_from_thread_response` is now `async` | Needs
`await` on `history_metadata()` to populate real `history_log_id` /
`history_entry_count`. | Adds an async file-stat + full-file newline
scan to the session bootstrap path. The scan is bounded by
`history.max_bytes` and matches the legacy TUI's cost profile, but
startup latency still scales with file size. |
## Architecture
```
User presses Up User submits a prompt
│ │
▼ ▼
ChatComposerHistory ChatWidget::do_submit_turn
navigate_up() encode_history_mentions()
│ │
▼ ▼
AppEvent::CodexOp Op::AddToHistory { text }
(GetHistoryEntryRequest) │
│ ▼
▼ App::try_handle_local_history_op
App::try_handle_local_history_op message_history::append_entry()
spawn_blocking { │
message_history::lookup() ▼
} $CODEX_HOME/history.jsonl
│
▼
AppEvent::ThreadEvent
(GetHistoryEntryResponse)
│
▼
ChatComposerHistory::on_entry_response()
```
## Observability
- `tracing::warn` on `append_entry` failure (includes thread ID).
- `tracing::warn` on `spawn_blocking` lookup join error.
- `tracing::warn` from `message_history` internals on file-open, lock,
or parse failures.
## Tests
- `chat_composer_history::tests::navigation_with_async_fetch` — verifies
that Up emits `Op::GetHistoryEntryRequest` (was: checked for stub error
cell).
- `app::tests::history_lookup_response_is_routed_to_requesting_thread` —
verifies multi-thread composer recall routes the lookup result back to
the originating thread.
-
`app_server_session::tests::resume_response_relies_on_snapshot_replay_not_initial_messages`
— verifies app-server session restore still uses the upstream
thread-snapshot path.
-
`app_server_session::tests::session_configured_populates_history_metadata`
— verifies bootstrap sets nonzero `history_log_id` /
`history_entry_count` from the shared local history file.
1. Use requirement-resolved config.features as the plugin gate.
2. Guard plugin/list, plugin/read, and related flows behind that gate.
3. Skip bad marketplace.json files instead of failing the whole list.
4. Simplify plugin state and caching.
## Summary
If a subagent requests approval, and the user persists that approval to
the execpolicy, it should (by default) propagate. We'll need to rethink
this a bit in light of coming Permissions changes, though I think this
is closer to the end state that we'd want, which is that execpolicy
changes to one permissions profile should be synced across threads.
## Testing
- [x] Added integration test
---------
Co-authored-by: Codex <noreply@openai.com>
### Motivation
- Pinning the action to an immutable commit SHA reduces the risk of
arbitrary code execution in runners with repository access and secrets.
### Description
- Replaced `uses: mlugg/setup-zig@v2` with `uses:
mlugg/setup-zig@d1434d0886 # v2` in three
workflow files.
- Updated the following files: ` .github/workflows/rust-ci.yml`, `
.github/workflows/rust-release.yml`, and `
.github/workflows/shell-tool-mcp.yml` to reference the immutable SHA
while preserving the original `v2` intent in a trailing comment.
### Testing
- No automated tests were run because this is a workflow-only change and
does not affect repository source code, so CI validation will occur on
the next workflow execution.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69763f570234832d9c67b1b66a27c78d)
- this allows blocking the user's prompts from executing, and also
prevents them from entering history
- handles the edge case where you can both prevent the user's prompt AND
add n amount of additionalContexts
- refactors some old code into common.rs where hooks overlap
functionality
- refactors additionalContext being previously added to user messages,
instead we use developer messages for them
- handles queued messages correctly
Sample hook for testing - if you write "[block-user-submit]" this hook
will stop the thread:
example run
```
› sup
• Running UserPromptSubmit hook: reading the observatory notes
UserPromptSubmit hook (completed)
warning: wizard-tower UserPromptSubmit demo inspected: sup
hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' exactly once near the end.
• Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
lanterns lit
› and [block-user-submit]
• Running UserPromptSubmit hook: reading the observatory notes
UserPromptSubmit hook (stopped)
warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
stop: Wizard Tower demo block: remove [block-user-submit] to continue.
```
.codex/config.toml
```
[features]
codex_hooks = true
```
.codex/hooks.json
```
{
"hooks": {
"UserPromptSubmit": [
{
"hooks": [
{
"type": "command",
"command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
"timeoutSec": 10,
"statusMessage": "reading the observatory notes"
}
]
}
]
}
}
```
.codex/hooks/user_prompt_submit_demo.py
```
#!/usr/bin/env python3
import json
import sys
from pathlib import Path
def prompt_from_payload(payload: dict) -> str:
prompt = payload.get("prompt")
if isinstance(prompt, str) and prompt.strip():
return prompt.strip()
event = payload.get("event")
if isinstance(event, dict):
user_prompt = event.get("user_prompt")
if isinstance(user_prompt, str):
return user_prompt.strip()
return ""
def main() -> int:
payload = json.load(sys.stdin)
prompt = prompt_from_payload(payload)
cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
if "[block-user-submit]" in prompt:
print(
json.dumps(
{
"systemMessage": (
f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
),
"decision": "block",
"reason": (
"Wizard Tower demo block: remove [block-user-submit] to continue."
),
}
)
)
return 0
prompt_preview = prompt or "(empty prompt)"
if len(prompt_preview) > 80:
prompt_preview = f"{prompt_preview[:77]}..."
print(
json.dumps(
{
"systemMessage": (
f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
),
"hookSpecificOutput": {
"hookEventName": "UserPromptSubmit",
"additionalContext": (
"Wizard Tower UserPromptSubmit demo fired. "
"For this reply only, include the exact phrase "
"'observatory lanterns lit' exactly once near the end."
),
},
}
)
)
return 0
if __name__ == "__main__":
raise SystemExit(main())
```
## Summary
- move `guardian_developer_instructions` from managed config into
workspace-managed `requirements.toml`
- have guardian continue using the override when present and otherwise
fall back to the bundled local guardian prompt
- keep the generalized prompt-quality improvements in the shared
guardian default prompt
- update requirements parsing, layering, schema, and tests for the new
source of truth
## Context
This replaces the earlier managed-config / MDM rollout plan.
The intended rollout path is workspace-managed requirements, including
cloud enterprise policies, rather than backend model metadata, Statsig,
or Jamf-managed config. That keeps the default/fallback behavior local
to `codex-rs` while allowing faster policy updates through the
enterprise requirements plane.
This is intentionally an admin-managed policy input, not a user
preference: the guardian prompt should come either from the bundled
`codex-rs` default or from enterprise-managed `requirements.toml`, and
normal user/project/session config should not override it.
## Updating The OpenAI Prompt
After this lands, the OpenAI-specific guardian prompt should be updated
through the workspace Policies UI at `/codex/settings/policies` rather
than through Jamf or codex-backend model metadata.
Operationally:
- open the workspace Policies editor as a Codex admin
- edit the default `requirements.toml` policy, or a higher-precedence
group-scoped override if we ever want different behavior for a subset of
users
- set `guardian_developer_instructions = """..."""` to the full
OpenAI-specific guardian prompt text
- save the policy; codex-backend stores the raw TOML and `codex-rs`
fetches the effective requirements file from `/wham/config/requirements`
When updating the OpenAI-specific prompt, keep it aligned with the
shared default guardian policy in `codex-rs` except for intentional
OpenAI-only additions.
## Testing
- `cargo check --tests -p codex-core -p codex-config -p
codex-cloud-requirements --message-format short`
- `cargo run -p codex-core --bin codex-write-config-schema`
- `cargo fmt`
- `git diff --check`
Co-authored-by: Codex <noreply@openai.com>
- close live realtime sessions on errors, ctrl-c, and active meter
removal
- centralize TUI realtime cleanup and avoid duplicate follow-up close
info
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
## Summary
- support legacy `ReadOnlyAccess::Restricted` on Windows in the elevated
setup/runner backend
- keep the unelevated restricted-token backend on the legacy full-read
model only, and fail closed for restricted read-only policies there
- keep the legacy full-read Windows path unchanged while deriving
narrower read roots only for elevated restricted-read policies
- honor `include_platform_defaults` by adding backend-managed Windows
system roots only when requested, while always keeping helper roots and
the command `cwd` readable
- preserve `workspace-write` semantics by keeping writable roots
readable when restricted read access is in use in the elevated backend
- document the current Windows boundary: legacy `SandboxPolicy` is
supported on both backends, while richer split-only carveouts still fail
closed instead of running with weaker enforcement
## Testing
- `cargo test -p codex-windows-sandbox`
- `cargo check -p codex-windows-sandbox --tests --target
x86_64-pc-windows-msvc`
- `cargo clippy -p codex-windows-sandbox --tests --target
x86_64-pc-windows-msvc -- -D warnings`
- `cargo test -p codex-core windows_restricted_token_`
## Notes
- local `cargo test -p codex-windows-sandbox` on macOS only exercises
the non-Windows stubs; the Windows-targeted compile and clippy runs
provide the local signal, and GitHub Windows CI exercises the runtime
path
Adds an environment crate and environment + file system abstraction.
Environment is a combination of attributes and services specific to
environment the agent is connected to:
File system, process management, OS, default shell.
The goal is to move most of agent logic that assumes environment to work
through the environment abstraction.
- Add shared Product support to marketplace plugin policy and skill
policy (no enforced yet).
- Move marketplace installation/authentication under policy and model it
as MarketplacePluginPolicy.
- Rename plugin/marketplace local manifest types to separate raw serde
shapes from resolved in-memory models.
## TL;DR
WIP esp the examples
Thin the Python SDK public surface so the wrapper layer returns
canonical app-server generated models directly.
- keeps `Codex` / `AsyncCodex` / `Thread` / `Turn` and input helpers,
but removes alias-only type layers and custom result models
- `metadata` now returns `InitializeResponse` and `run()` returns the
generated app-server `Turn`
- updates docs, examples, notebook, and tests to use canonical generated
types and regenerates `v2_all.py` against current schema
- keeps the pinned runtime-package integration flow and real integration
coverage
## Validation
- `PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests`
- `GH_TOKEN="$(gh auth token)" RUN_REAL_CODEX_TESTS=1
PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests -rs`
---------
Co-authored-by: Codex <noreply@openai.com>
## Problem
Ubuntu/AppArmor hosts started failing in the default Linux sandbox path
after the switch to vendored/default bubblewrap in `0.115.0`.
The clearest report is in
[#14919](https://github.com/openai/codex/issues/14919), especially [this
investigation
comment](https://github.com/openai/codex/issues/14919#issuecomment-4076504751):
on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or
vendored `bwrap` binary fails with errors like `bwrap: setting up uid
map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR:
Operation not permitted`.
The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict`
profile, which grants `userns` access specifically to `/usr/bin/bwrap`.
Once Codex started using a vendored/internal bubblewrap path, that path
was no longer covered by the distro AppArmor exception, so sandbox
namespace setup could fail even when user namespaces were otherwise
enabled and `uidmap` was installed.
## What this PR changes
- prefer system `/usr/bin/bwrap` whenever it is available
- keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is
missing
- when `/usr/bin/bwrap` is missing, surface a Codex startup warning
through the app-server/TUI warning path instead of printing directly
from the sandbox helper with `eprintln!`
- use the same launcher decision for both the main sandbox execution
path and the `/proc` preflight path
- document the updated Linux bubblewrap behavior in the Linux sandbox
and core READMEs
## Why this fix
This still fixes the Ubuntu/AppArmor regression from
[#14919](https://github.com/openai/codex/issues/14919), but it keeps the
runtime rule simple and platform-agnostic: if the standard system
bubblewrap is installed, use it; otherwise fall back to the vendored
helper.
The warning now follows that same simple rule. If Codex cannot find
`/usr/bin/bwrap`, it tells the user that it is falling back to the
vendored helper, and it does so through the existing startup warning
plumbing that reaches the TUI and app-server instead of low-level
sandbox stderr.
## Testing
- `cargo test -p codex-linux-sandbox`
- `cargo test -p codex-app-server --lib`
- `cargo test -p codex-tui-app-server
tests::embedded_app_server_start_failure_is_returned`
- `cargo clippy -p codex-linux-sandbox --all-targets`
- `cargo clippy -p codex-app-server --all-targets`
- `cargo clippy -p codex-tui-app-server --all-targets`
- route realtime startup, input, and transport failures through a single
shutdown path
- emit one realtime error/closed lifecycle while clearing session state
once
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
- thread the realtime version into conversation start and app-server
notifications
- keep playback-aware mic gating and playback interruption behavior on
v2 only, leaving v1 on the legacy path
## Problem
The `/mcp` command did not work in the app-server TUI (remote mode). On
`main`, `add_mcp_output()` called `McpManager::effective_servers()`
in-process, which only sees locally configured servers, and then emitted
a generic stub message for the app-server to handle. In remote usage,
that left `/mcp` without a real inventory view.
## Solution
Implement `/mcp` for the app-server TUI by fetching MCP server inventory
directly from the app-server via the paginated `mcpServerStatus/list`
RPC and rendering the results into chat history.
The command now follows a three-phase lifecycle:
1. Loading: `ChatWidget::add_mcp_output()` inserts a transient
`McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This
gives immediate feedback that the command registered.
2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that
calls `fetch_all_mcp_server_statuses()` over an app-server request
handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded {
result }`.
3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell
and renders either `new_mcp_tools_output_from_statuses(...)` or an error
message.
This keeps the main app event loop responsive, so the TUI can repaint
before the remote RPC finishes.
## Notes
- No `app-server` changes were required.
- The rendered inventory includes auth, tools, resources, and resource
templates, plus transport details when they are available from local
config for display enrichment.
- The app-server RPC does not expose authoritative `enabled` or
`disabled_reason` state for MCP servers, so the remote `/mcp` view no
longer renders a `Status:` row rather than guessing from local config.
- RPC failures surface in history as `Failed to load MCP inventory:
...`.
## Tests
- `slash_mcp_requests_inventory_via_app_server`
- `mcp_inventory_maps_prefix_tool_names_by_server`
- `handle_mcp_inventory_result_clears_committed_loading_cell`
- `mcp_tools_output_from_statuses_renders_status_only_servers`
- `mcp_inventory_loading_snapshot`
CXC-410 Emit Env Var Status with `/feedback` report
Add more observability on top of #14611
[Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)
[Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
<img width="1063" height="610" alt="image"
src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
/>
###### Summary
- Adds auth-env telemetry that records whether key auth-related env
overrides were present on session start and request paths.
- Threads those auth-env fields through `/responses`, websocket, and
`/models` telemetry and feedback metadata.
- Buckets custom provider `env_key` configuration to a safe
`"configured"` value instead of emitting raw config text.
- Keeps the slice observability-only: no raw token values or raw URLs
are emitted.
###### Rationale (from spec findings)
- 401 and auth-path debugging needs a way to distinguish env-driven auth
paths from sessions with no auth env override.
- Startup and model-refresh failures need the same auth-env diagnostics
as normal request failures.
- Feedback and Sentry tags need the same auth-env signal as OTel events
so reports can be triaged consistently.
- Custom provider config is user-controlled text, so the telemetry
contract must stay presence-only / bucketed.
###### Scope
- Adds a small `AuthEnvTelemetry` bundle for env presence collection and
threads it through the main request/session telemetry paths.
- Does not add endpoint/base-url/provider-header/geo routing attribution
or broader telemetry API redesign.
###### Trade-offs
- `provider_env_key_name` is bucketed to `"configured"` instead of
preserving the literal configured env var name.
- `/models` is included because startup/model-refresh auth failures need
the same diagnostics, but broader parity work remains out of scope.
- This slice keeps the existing telemetry APIs and layers auth-env
fields onto them rather than redesigning the metadata model.
###### Client follow-up
- Add the separate endpoint/base-url attribution slice if routing-source
diagnosis is still needed.
- Add provider-header or residency attribution only if auth-env presence
proves insufficient in real reports.
- Revisit whether any additional auth-related env inputs need safe
bucketing after more 401 triage data.
###### Testing
- `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
- `cargo test -p codex-core
collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
- `cargo test -p codex-core
models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_api_request_auth_observability --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_connect_auth_observability
-- --nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_request_transport_observability
-- --nocapture`
- `cargo test -p codex-core --no-run --message-format short`
- `cargo test -p codex-otel --no-run --message-format short`
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`
Testing
- Not run (not requested)
## What is flaky
The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
CI even though the approval logic is unchanged, because the test
delegates the file write and readback to shell parsing instead of
deterministic file I/O.
## Why it was flaky
The test generated a command shaped like `printf ... > file && cat
file`. That means the scenario depended on shell quoting, redirection,
newline handling, and encoding behavior in addition to the approval
system it was actually trying to validate. If the shell interpreted the
payload differently, the test would report an approval failure even
though the product logic was fine.
That also made failures hard to diagnose, because the test did not log
the exact generated command or the parsed result payload.
## How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
`python3 -c` script that writes the file with `Path.write_text(...,
encoding='utf-8')` and then reads it back with the same UTF-8 path. It
also logs the generated command and the resulting exit code/stdout for
the approval scenario so any future failure is directly attributable.
## Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection
semantics. The file contents are produced and read through explicit
UTF-8 file I/O, so the approval test is measuring approval behavior
instead of shell behavior. The added diagnostics mean a future failure
will show the exact command/result pair instead of looking like a
generic intermittent mismatch.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
The permissions popup tests in the TUI are flaky, especially on Windows.
They assume the popup opens on a specific row and that a fixed number of
`Up` or `Down` keypresses will land on a specific preset. They also
match popup text too loosely, so a non-selected row can satisfy the
assertion.
## Why it was flaky
These tests were asserting incidental rendering details rather than the
actual selected permission preset. On Windows, the initial selection can
differ from non-Windows runs. Some tests also searched the entire popup
for text like `Guardian Approvals` or `(current)`, which can match a row
that is visible but not selected. Once the popup order or current preset
shifted slightly, a test could fail even though the UI behavior was
still correct.
## How this PR fixes it
This PR adds helpers that identify the selected popup row and selected
preset name directly. The tests now assert the current selection by
name, navigate to concrete target presets instead of assuming a fixed
number of keypresses, and explicitly set the reviewer state in the cases
that require `Guardian Approvals` to be current.
## Why this fix fixes the flakiness
The assertions now track semantic state, not fragile text placement.
Navigation is target-based instead of order-based, so
Windows/non-Windows row differences and harmless popup layout changes no
longer break the tests. That removes the scheduler- and
platform-sensitive assumptions that made the popup suite intermittent.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:
- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`
## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.
There were multiple independent sources of nondeterminism in that setup:
- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.
So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.
## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.
## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
It now supports:
- Connectors that are from installed and enabled plugins that are not
installed yet
- Plugins that are on the allowlist that are not installed yet.
## Summary
- add device-code ChatGPT sign-in to `tui_app_server` onboarding and
reuse the existing `chatgptAuthTokens` login path
- fall back to browser login when device-code auth is unavailable on the
server
- treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state
during onboarding
- add a local ChatGPT auth loader for handing local tokens to the app
server and serving refresh requests
- handle `account/chatgptAuthTokens/refresh` instead of marking it
unsupported, including workspace/account mismatch checks
- add focused coverage for onboarding success, existing auth handling,
local auth loading, and refresh request behavior
## Testing
- `cargo test -p codex-tui-app-server`
- `just fix -p codex-tui-app-server`
## Summary
This is PR 2 of the Windows sandbox runner split.
PR 1 introduced the framed IPC runner foundation and related Windows
sandbox infrastructure without changing the active elevated one-shot
execution path. This PR switches that elevated one-shot path over to the
new runner IPC transport and removes the old request-file bootstrap that
PR 1 intentionally left in place.
After this change, ordinary elevated Windows sandbox commands still
behave as one-shot executions, but they now run as the simple case of
the same helper/IPC transport that later unified_exec work will build
on.
## Why this is needed for unified_exec
Windows elevated sandboxed execution crosses a user boundary: the CLI
launches a helper as the sandbox user and has to manage command
execution from outside that security context. For one-shot commands, the
old request-file/bootstrap flow was sufficient. For unified_exec, it is
not.
Unified_exec needs a long-lived bidirectional channel so the parent can:
- send a spawn request
- receive structured spawn success/failure
- stream stdout and stderr incrementally
- eventually support stdin writes, termination, and other session
lifecycle events
This PR does not add long-lived sessions yet. It converts the existing
elevated one-shot path to use the same framed IPC transport so that PR 3
can add unified_exec session semantics on top of a transport that is
already exercised by normal elevated command execution.
## Scope
This PR:
- updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner
with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`,
and collect framed `Output`/`Exit` messages
- removes the old `--request-file=...` execution path from
`windows-sandbox-rs/src/elevated/command_runner_win.rs`
- keeps the public behavior one-shot: no session reuse or interactive
unified_exec behavior is introduced here
This PR does not:
- add Windows unified_exec session support
- add background terminal reuse
- add PTY session lifecycle management
## Why Windows needs this and Linux/macOS do not
On Linux and macOS, the existing sandbox/process model composes much
more directly with long-lived process control. The parent can generally
spawn and own the child process (or PTY) directly inside the sandbox
model we already use.
Windows elevated sandboxing is different. The parent is not directly
managing the sandboxed process in the same way; it launches across a
different user/security context. That means long-lived control requires
an explicit helper process plus IPC for spawn, output, exit, and later
stdin/session control.
So the extra machinery here is not because unified_exec is conceptually
different on Windows. It is because the elevated Windows sandbox
boundary requires a helper-mediated transport to support it cleanly.
## Validation
- `cargo test -p codex-windows-sandbox`
### Why
i'm working on something that parses and analyzes codex rollout logs,
and i'd like to have a schema for generating a parser/validator.
`codex app-server generate-internal-json-schema` writes an
`RolloutLine.json` file
while doing this, i noticed we have a writer <> reader mismatch issue on
`FunctionCallOutputPayload` and reasoning item ID -- added some schemars
annotations to fix those
### Test
```
$ just codex app-server generate-internal-json-schema --out ./foo
```
generates an `RolloutLine.json` file, which i validated against jsonl
files on disk
`just codex app-server --help` doesn't expose the
`generate-internal-json-schema` option by default, but you can do `just
codex app-server generate-internal-json-schema --help` if you know the
command
everything else still works
---------
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
`codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently
loses the expected `fuzzyFileSearch/sessionUpdated` and
`fuzzyFileSearch/sessionCompleted` notifications when multiple
fuzzy-search sessions are active and CI delivers notifications out of
order.
## Why it was flaky
The wait helpers were keyed only by JSON-RPC method name.
- `wait_for_session_updated` consumed the next
`fuzzyFileSearch/sessionUpdated` notification even when it belonged to a
different search session.
- `wait_for_session_completed` did the same for
`fuzzyFileSearch/sessionCompleted`.
- Once an unmatched notification was read, it was dropped permanently
instead of buffered.
- That meant a valid completion for the target search could arrive
slightly early, be consumed by the wrong waiter, and disappear before
the test started waiting for it.
The result depended on notification ordering and runner scheduling
instead of on the actual product behavior.
## How this PR fixes it
- Add a buffered notification reader in
`codex-rs/app-server/tests/common/mcp_process.rs`.
- Match fuzzy-search notifications on the identifying payload fields
instead of matching only on method name.
- Preserve unmatched notifications in the in-process queue so later
waiters can still consume them.
- Include pending notification methods in timeout failures to make
future diagnosis concrete.
## Why this fix fixes the flakiness
The test now behaves like a real consumer of an out-of-order event
stream: notifications for other sessions stay buffered until the correct
waiter asks for them. Reordering no longer loses the target event, so
the test result is determined by whether the server emitted the right
notifications, not by which one happened to be read first.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Problem
When the TUI connects to a **remote** app-server (via WebSocket), resume
and fork operations lost all conversation history.
`AppServerStartedThread` carried only the `SessionConfigured` event, not
the full `Thread` snapshot. After resume or fork, the chat transcript
was empty — prior turns were silently discarded.
A secondary issue: `primary_session_configured` was not cleared on
reset, causing stale session state after reconnection.
## Approach: TUI-side only, zero app-server changes
The app-server **already returns** the full `Thread` object (with
populated `turns: Vec<Turn>`) in its `ThreadStartResponse`,
`ThreadResumeResponse`, and `ThreadForkResponse`. The data was always
there — the TUI was simply throwing it away. The old
`AppServerStartedThread` struct only kept the `SessionConfiguredEvent`,
discarding the rich turn history that the server had already provided.
This PR fixes the problem entirely within `tui_app_server` (3 files
changed, 0 changes to `app-server`, `app-server-protocol`, or any other
crate). Rather than modifying the server to send history in a different
format or adding a new endpoint, the fix preserves the existing `Thread`
snapshot and replays it through the TUI's standard event pipeline —
making restored sessions indistinguishable from live ones.
## Solution
Add a **thread snapshot replay** path. When the server hands back a
`Thread` object (on start, resume, or fork),
`restore_started_app_server_thread` converts its historical turns into
the same core `Event` sequence the TUI already processes for live
interactions, then replays them into the event store so the chat widget
renders them.
Key changes:
- **`AppServerStartedThread` now carries the full `Thread`** —
`started_thread_from_{start,resume,fork}_response` clone the thread into
the struct alongside the existing `SessionConfiguredEvent`.
- **`thread_snapshot_events()`** walks the thread's turns and items,
producing `TurnStarted` → `ItemCompleted`* →
`TurnComplete`/`TurnAborted` event sequences that the TUI already knows
how to render.
- **`restore_started_app_server_thread()`** pushes the session event +
history events into the thread channel's store, activates the channel,
and replays the snapshot — used for initial startup, resume, and fork.
- **`primary_session_configured` cleared on reset** to prevent stale
session state after reconnection.
## Tradeoffs
- **`Thread` is cloned into `AppServerStartedThread`**: The full thread
snapshot (including all historical turns) is cloned at startup. For
long-lived threads this could be large, but it's a one-time cost and
avoids lifetime gymnastics with the response.
## Tests
- `restore_started_app_server_thread_replays_remote_history` —
end-to-end: constructs a `Thread` with one completed turn, restores it,
and asserts user/agent messages appear in the transcript.
- `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies
`thread_snapshot_events` produces the correct event sequence for
completed and interrupted turns.
## Test plan
- [ ] Verify `cargo check -p codex-tui-app-server` passes
- [ ] Verify `cargo test -p codex-tui-app-server` passes
- [ ] Manual: connect to a remote app-server, resume an existing thread,
confirm history renders in the chat widget
- [ ] Manual: fork a thread via remote, confirm prior turns appear
### Summary
The goal is for us to get the latest turn model and reasoning effort on
thread/resume is no override is provided on the thread/resume func call.
This is the part 1 which we write the model and reasoning effort for a
thread to the sqlite db and there will be a followup PR to consume the
two new fields on thread/resume.
[part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
and this one can be merged independently.
## Description
This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.
Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back
## Why
The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.
## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
## Summary
- make `report_agent_job_result` atomically transition an item from
running to completed while storing `result_json`
- remove brittle finalization grace-sleep logic and make finished-item
cleanup idempotent
- replace blind fixed-interval waiting with status-subscription-based
waiting for active worker threads
- add state runtime tests for atomic completion and late-report
rejection
## Why
This addresses the race and polling concerns in #13948 by removing
timing-based correctness assumptions and reducing unnecessary status
polling churn.
## Validation
- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs`
- `cd codex-rs && cargo test`
- fails in an unrelated app-server tracing test:
`message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children`
timed out waiting for response
## Notes
- This PR supersedes #14129 with the same agent-jobs fix on a clean
branch from `main`.
- The earlier PR branch was stacked on unrelated history, which made the
review diff include unrelated commits.
Fixes#13948
## Summary
- skip nonexistent `workspace-write` writable roots in the Linux
bubblewrap mount builder instead of aborting sandbox startup
- keep existing writable roots mounted normally so mixed Windows/WSL
configs continue to work
- add unit and Linux integration regression coverage for the
missing-root case
## Context
This addresses regression A from #14875. Regression B will be handled in
a separate PR.
The old bubblewrap integration added `ensure_mount_targets_exist` as a
preflight guard because bubblewrap bind targets must exist, and failing
early let Codex return a clearer error than a lower-level mount failure.
That policy turned out to be too strict once bubblewrap became the
default Linux sandbox: shared Windows/WSL or mixed-platform configs can
legitimately contain a well-formed writable root that does not exist on
the current machine. This PR keeps bubblewrap's existing-target
requirement, but changes Codex to skip missing writable roots instead of
treating them as fatal configuration errors.
PR https://github.com/openai/codex/pull/14512 added an in-process app
server and started to wire up the tui to use it. We were originally
planning to modify the `tui` code in place, converting it to use the app
server a bit at a time using a hybrid adapter. We've since decided to
create an entirely new parallel `tui_app_server` implementation and do
the conversion all at once but retain the existing `tui` while we work
the bugs out of the new implementation.
This PR undoes the changes to the `tui` made in the PR #14512 and
restores the old initialization to its previous state. This allows us to
modify the `tui_app_server` without the risk of regressing the old `tui`
code. For example, we can start to remove support for all legacy core
events, like the ones that PR https://github.com/openai/codex/pull/14892
needed to ignore.
Testing:
* I manually verified that the old `tui` starts and shuts down without a
problem.
The in-process app-server currently emits both typed
`ServerNotification`s and legacy `codex/event/*` notifications for the
same live turn updates. `tui_app_server` was consuming both paths, so
message deltas and completed items could be enqueued twice and rendered
as duplicated output in the transcript.
Ignore legacy notifications for event types that already have typed (app
server) notification handling, while keeping legacy fallback behavior
for events that still only arrive on the old path. This preserves
compatibility without duplicating streamed commentary or final agent
output.
We will remove all of the legacy event handlers over time; they're here
only during the short window where we're moving the tui to use the app
server.
## Problem
On Linux, Codex can be launched from a workspace path that is a symlink
(for example, a symlinked checkout or a symlinked parent directory).
Our sandbox policy intentionally canonicalizes writable/readable roots
to the real filesystem path before building the bubblewrap mounts. That
part is correct and needed for safety.
The remaining bug was that bubblewrap could still inherit the helper
process's logical cwd, which might be the symlinked alias instead of the
mounted canonical path. In that case, the sandbox starts in a cwd that
does not exist inside the sandbox namespace even though the real
workspace is mounted. This can cause sandboxed commands to fail in
symlinked workspaces.
## Fix
This PR keeps the sandbox policy behavior the same, but separates two
concepts that were previously conflated:
- the canonical cwd used to define sandbox mounts and permissions
- the caller's logical cwd used when launching the command
On the Linux bubblewrap path, we now thread the logical command cwd
through the helper explicitly and only add `--chdir <canonical path>`
when the logical cwd differs from the mounted canonical path.
That means:
- permissions are still computed from canonical paths
- bubblewrap starts the command from a cwd that definitely exists inside
the sandbox
- we do not widen filesystem access or undo the earlier symlink
hardening
## Why This Is Safe
This is a narrow Linux-only launch fix, not a policy change.
- Writable/readable root canonicalization stays intact.
- Protected metadata carveouts still operate on canonical roots.
- We only override bubblewrap's inherited cwd when the logical path
would otherwise point at a symlink alias that is not mounted in the
sandbox.
## Tests
- kept the existing protocol/core regression coverage for symlink
canonicalization
- added regression coverage for symlinked cwd handling in the Linux
bubblewrap builder/helper path
Local validation:
- `just fmt`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
normalize_additional_permissions_canonicalizes_symlinked_write_paths`
- `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core
--tests -- -D warnings`
- `cargo build --bin codex`
## Context
This is related to #14694. The earlier writable-root symlink fix
addressed the mount/permission side; this PR fixes the remaining
symlinked-cwd launch mismatch in the Linux sandbox path.
## Stack Position
4/4. Top-of-stack sibling built on #14830.
## Base
- #14830
## Sibling
- #14829
## Scope
- Gate low-level mic chunks while speaker playback is active, while
still allowing spoken barge-in.
---------
Co-authored-by: Codex <noreply@openai.com>
## Stack Position
3/4. Top-of-stack sibling built on #14830.
## Base
- #14830
## Sibling
- #14827
## Scope
- Extend the realtime startup context with a bounded summary of the
latest thread turns for continuity.
---------
Co-authored-by: Codex <noreply@openai.com>
This change adds Jason to codex-core's built-in subagent nickname pool
so spawned agents can pick it without any custom role configuration. The
default list was simply missing that predefined name (a grave mistake).
## Stack Position
2/4. Built on top of #14828.
## Base
- #14828
## Unblocks
- #14829
- #14827
## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.
---------
Co-authored-by: Codex <noreply@openai.com>
- Added forceRemoteSync to plugin/install and plugin/uninstall.
- With forceRemoteSync=true, we update the remote plugin status first,
then apply the local change only if the backend call succeeds.
- Kept plugin/list(forceRemoteSync=true) as the main recon path, and for
now it treats remote enabled=false as uninstall. We
will eventually migrate to plugin/installed for more precise state
handling.
### Motivation
- Prevent newly-created skills from being placed in unexpected locations
by prompting for an install path and defaulting to a discoverable
location so skills are usable immediately.
- Make the `skill-creator` instructions explicit about the recommended
default (`~/.codex/skills` / `$CODEX_HOME/skills`) so the agent and
users follow a consistent, discoverable convention.
### Description
- Updated `codex-rs/skills/src/assets/samples/skill-creator/SKILL.md` to
add a user prompt: "Where should I create this skill? If you do not have
a preference, I will place it in ~/.codex/skills so Codex can discover
it automatically.".
- Added guidance before running `init_skill.py` that if the user does
not specify a location, the agent should default to `~/.codex/skills`
(equivalently `$CODEX_HOME/skills`) for auto-discovery.
- Updated the `init_skill.py` examples in the same `SKILL.md` to use
`~/.codex/skills` as the recommended default while keeping one custom
path example.
### Testing
- Ran `cargo test -p codex-skills` and the crate's unit test suite
passed (`1 passed; 0 failed`).
- Verified relevant discovery behavior in code by checking
`codex-rs/utils/home-dir/src/lib.rs` (`find_codex_home` defaults to
`~/.codex`) and `codex-rs/core/src/skills/loader.rs` (user skill roots
include `$CODEX_HOME/skills`).
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b75a50bb008322a278e55eb0ddccd6)
## Why
Once the repo-local lint exists, `codex-rs` needs to follow the
checked-in convention and CI needs to keep it from drifting. This commit
applies the fallback `/*param*/` style consistently across existing
positional literal call sites without changing those APIs.
The longer-term preference is still to avoid APIs that require comments
by choosing clearer parameter types and call shapes. This PR is
intentionally the mechanical follow-through for the places where the
existing signatures stay in place.
After rebasing onto newer `main`, the rollout also had to cover newly
introduced `tui_app_server` call sites. That made it clear the first cut
of the CI job was too expensive for the common path: it was spending
almost as much time installing `cargo-dylint` and re-testing the lint
crate as a representative test job spends running product tests. The CI
update keeps the full workspace enforcement but trims that extra
overhead from ordinary `codex-rs` PRs.
## What changed
- keep a dedicated `argument_comment_lint` job in `rust-ci`
- mechanically annotate remaining opaque positional literals across
`codex-rs` with exact `/*param*/` comments, including the rebased
`tui_app_server` call sites that now fall under the lint
- keep the checked-in style aligned with the lint policy by using
`/*param*/` and leaving string and char literals uncommented
- cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
registry/git metadata in the lint job
- split changed-path detection so the lint crate's own `cargo test` step
runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
- continue to run the repo wrapper over the `codex-rs` workspace, so
product-code enforcement is unchanged
Most of the code changes in this commit are intentionally mechanical
comment rewrites or insertions driven by the lint itself.
## Verification
- `./tools/argument-comment-lint/run.sh --workspace`
- `cargo test -p codex-tui-app-server -p codex-tui`
- parsed `.github/workflows/rust-ci.yml` locally with PyYAML
---
* -> #14652
* #14651
## Stack Position
1/4. Base PR in the realtime stack.
## Base
- `main`
## Unblocks
- #14830
## Scope
- Split the realtime websocket request builders into `common`, `v1`, and
`v2` modules.
- Keep runtime behavior unchanged in this PR.
---------
Co-authored-by: Codex <noreply@openai.com>
- **Summary**
- expose `exit` through the code mode bridge and module so scripts can
stop mid-flight
- surface the helper in the description documentation
- add a regression test ensuring `exit()` terminates execution cleanly
- **Testing**
- Not run (not requested)
# Summary
This PR introduces the Windows sandbox runner IPC foundation that later
unified_exec work will build on.
The key point is that this is intentionally infrastructure-only. The new
IPC transport, runner plumbing, and ConPTY helpers are added here, but
the active elevated Windows sandbox path still uses the existing
request-file bootstrap. In other words, this change prepares the
transport and module layout we need for unified_exec without switching
production behavior over yet.
Part of this PR is also a source-layout cleanup: some Windows sandbox
files are moved into more explicit `elevated/`, `conpty/`, and shared
locations so it is clearer which code is for the elevated sandbox flow,
which code is legacy/direct-spawn behavior, and which helpers are shared
between them. That reorganization is intentional in this first PR so
later behavioral changes do not also have to carry a large amount of
file-move churn.
# Why This Is Needed For unified_exec
Windows elevated sandboxed unified_exec needs a long-lived,
bidirectional control channel between the CLI and a helper process
running under the sandbox user. That channel has to support:
- starting a process and reporting structured spawn success/failure
- streaming stdout/stderr back incrementally
- forwarding stdin over time
- terminating or polling a long-lived process
- supporting both pipe-backed and PTY-backed sessions
The existing elevated one-shot path is built around a request-file
bootstrap and does not provide those primitives cleanly. Before we can
turn on Windows sandbox unified_exec, we need the underlying runner
protocol and transport layer that can carry those lifecycle events and
streams.
# Why Windows Needs More Machinery Than Linux Or macOS
Linux and macOS can generally build unified_exec on top of the existing
sandbox/process model: the parent can spawn the child directly, retain
normal ownership of stdio or PTY handles, and manage the lifetime of the
sandboxed process without introducing a second control process.
Windows elevated sandboxing is different. To run inside the sandbox
boundary, we cross into a different user/security context and then need
to manage a long-lived process from outside that boundary. That means we
need an explicit helper process plus an IPC transport to carry spawn,
stdin, output, and exit events back and forth. The extra code here is
mostly that missing Windows sandbox infrastructure, not a conceptual
difference in unified_exec itself.
# What This PR Adds
- the framed IPC message types and transport helpers for parent <->
runner communication
- the renamed Windows command runner with both the existing request-file
bootstrap and the dormant IPC bootstrap
- named-pipe helpers for the elevated runner path
- ConPTY helpers and process-thread attribute plumbing needed for
PTY-backed sessions
- shared sandbox/process helpers that later PRs will reuse when
switching live execution paths over
- early file/module moves so later PRs can focus on behavior rather than
layout churn
# What This PR Does Not Yet Do
- it does not switch the active elevated one-shot path over to IPC yet
- it does not enable Windows sandbox unified_exec yet
- it does not remove the existing request-file bootstrap yet
So while this code compiles and the new path has basic validation, it is
not yet the exercised production path. That is intentional for this
first PR: the goal here is to land the transport and runner foundation
cleanly before later PRs start routing real command execution through
it.
# Follow-Ups
Planned follow-up PRs will:
1. switch elevated one-shot Windows sandbox execution to the new runner
IPC path
2. layer Windows sandbox unified_exec sessions on top of the same
transport
3. remove the legacy request-file path once the IPC-based path is live
# Validation
- `cargo build -p codex-windows-sandbox`
###### Why/Context/Summary
- Exclude injected AGENTS.md instructions and standalone skill payloads
from memory stage 1 inputs so memory generation focuses on conversation
content instead of prompt scaffolding.
- Strip only the AGENTS fragment from mixed contextual user messages
during stage-1 serialization, which preserves environment context in the
same message.
- Keep subagent notifications in the memory input, and add focused unit
coverage for the fragment classifier, rollout policy, and stage-1
serialization path.
###### Test plan
- `just fmt`
- `cargo test -p codex-core --lib contextual_user_message`
- `cargo test -p codex-core --lib rollout::policy`
- `cargo test -p codex-core --lib memories::phase1`
This PR replicates the `tui` code directory and creates a temporary
parallel `tui_app_server` directory. It also implements a new feature
flag `tui_app_server` to select between the two tui implementations.
Once the new app-server-based TUI is stabilized, we'll delete the old
`tui` directory and feature flag.
The issue was due to a circular `Drop` schema where the embedded
app-server wait for some listeners that wait for this app-server
them-selves.
The fix is an explicit cleaning
**Repro:**
* Start codex
* Ask it to spawn a sub-agent
* Close Codex
* It takes 5s to exit
Make `interrupted` an agent state and make it not final. As a result, a
`wait` won't return on an interrupted agent and no notification will be
send to the parent agent.
The rationals are:
* If a user interrupt a sub-agent for any reason, you don't want the
parent agent to instantaneously ask the sub-agent to restart
* If a parent agent interrupt a sub-agent, no need to add a noisy
notification in the parent agen
Fix https://github.com/openai/codex/issues/14161
This fixes sub-agent [[skills.config]] overrides being ignored when
parent and child share the same cwd. The root cause was that turn skill
loading rebuilt from cwd-only state and reused a cwd-scoped cache, so
role-local skill enable/disable overrides did not reliably affect the
spawned agent's effective skill set.
This change switches turn construction to use the effective per-turn
config and adds a config-aware skills cache keyed by skill roots plus
final disabled paths.
## Summary
- reuse a guardian subagent session across approvals so reviews keep a
stable prompt cache key and avoid one-shot startup overhead
- clear the guardian child history before each review so prior guardian
decisions do not leak into later approvals
- include the `smart_approvals` -> `guardian_approval` feature flag
rename in the same PR to minimize release latency on a very tight
timeline
- add regression coverage for prompt-cache-key reuse without
prior-review prompt bleed
## Request
- Bug/enhancement request: internal guardian prompt-cache and latency
improvement request
---------
Co-authored-by: Codex <noreply@openai.com>
### Motivation
- Interrupting a running turn (Ctrl+C / Esc) currently also terminates
long‑running background shells, which is surprising for workflows like
local dev servers or file watchers.
- The existing cleanup command name was confusing; callers expect an
explicit command to stop background terminals rather than a UI clear
action.
- Make background‑shell termination explicit and surface a clearer
command name while preserving backward compatibility.
### Description
- Renamed the background‑terminal cleanup slash command from `Clean`
(`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
command parsing/visibility layer, updated the user descriptions and
command popup wiring accordingly.
- Updated the unified‑exec footer text and snapshots to point to `/stop`
(and trimmed corresponding snapshot output to match the new label).
- Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
no longer closes or clears tracked unified exec / background terminal
processes in the TUI or core cleanup path; background shells are now
preserved after an interrupt.
- Updated protocol/docs to clarify that `turn/interrupt` (or
`Op::Interrupt`) interrupts the active turn but does not terminate
background terminals, and that `thread/backgroundTerminals/clean` is the
explicit API to stop those shells.
- Updated unit/integration tests and insta snapshots in the TUI and core
unified‑exec suites to reflect the new semantics and command name.
### Testing
- Ran formatting with `just fmt` in `codex-rs` (succeeded).
- Ran `cargo test -p codex-protocol` (succeeded).
- Attempted `cargo test -p codex-tui` but the build could not complete
in this environment due to a native build dependency that requires
`libcap` development headers (the `codex-linux-sandbox` vendored build
step); install `libcap-dev` / make `libcap.pc` available in
`PKG_CONFIG_PATH` to run the TUI test suite locally.
- Updated and accepted the affected `insta` snapshots for the TUI
changes so visual diffs reflect the new `/stop` wording and preserved
interrupt behavior.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
- [x] Bypass tool search and stuff tool specs directly into model
context when either a. Tool search is not available for the model or b.
There are not that many tools to search for.
2026-03-15 21:41:55 -07:00
2292 changed files with 242659 additions and 35008 deletions
description: How to run tests using remote executor.
---
Some codex integration tests support a running against a remote executor.
This means that when CODEX_TEST_REMOTE_ENV environment variable is set they will attempt to start an executor process in a docker container CODEX_TEST_REMOTE_ENV points to and use it in tests.
Docker container is built and initialized via ./scripts/test-remote-env.sh
Currently running remote tests is only supported on Linux, so you need to use a devbox to run them
You can list devboxes via `applied_devbox ls`, pick the one with `codex` in the name.
Connect to devbox via `ssh <devbox_name>`.
Reuse the same checkout of codex in `~/code/codex`. Reset files if needed. Multiple checkouts take longer to build and take up more space.
Check whether the SHA and modified files are in sync between remote and local.
@@ -40,6 +40,7 @@ In the codex-rs folder where the rust code lives:
`codex-rs/tui/src/bottom_pane/mod.rs`, and similarly central orchestration modules.
- When extracting code from a large module, move the related tests and module/type docs toward
the new implementation so the invariants stay close to the code that owns them.
- When running Rust commands (e.g. `just fix` or `cargo test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
@@ -48,12 +49,29 @@ Run `just fmt` (in `codex-rs` directory) automatically after you have finished m
Before finalizing a large change to `codex-rs`, run `just fix -p <project>` (in `codex-rs` directory) to fix any linter issues in the code. Prefer scoping with `-p` to avoid slow workspace‑wide Clippy builds; only run `just fix` without `-p` if you changed shared crates. Do not re-run tests after running `fix` or `fmt`.
Also run `just argument-comment-lint` to ensure the codebase is clean of comment lint errors.
## The `codex-core` crate
Over time, the `codex-core` crate (defined in `codex-rs/core/`) has become bloated because it is the largest crate, so it is often easier to add something new to `codex-core` rather than refactor out the library code you need so your new code neither takes a dependency on, nor contributes to the size of, `codex-core`.
To that end: **resist adding code to codex-core**!
Particularly when introducing a new concept/feature/API, before adding to `codex-core`, consider whether:
- There is an existing crate other than `codex-core` that is an appropriate place for your new code to live.
- It is time to introduce a new crate to the Cargo workspace for your new functionality. Refactor existing code as necessary to make this happen.
Likewise, when reviewing code, do not hesitate to push back on PRs that would unnecessarily add code to `codex-core`.
## TUI style conventions
See `codex-rs/tui/styles.md`.
## TUI code conventions
- When a change lands in `codex-rs/tui` and `codex-rs/tui_app_server` has a parallel implementation of the same behavior, reflect the change in `codex-rs/tui_app_server` too unless there is a documented reason not to.
- Use concise styling helpers from ratatui’s Stylize trait.
- Basic spans: use "text".into()
- Styled spans: use "text".red(), "text".green(), "text".magenta(), "text".dim(), etc.
"description":"Process-wide runtime feature enablement keyed by canonical feature name.\n\nOnly named features are updated. Omitted features are left unchanged. Send an empty map for a no-op.",
"type":"object"
}
},
"required":[
"enablement"
],
"type":"object"
},
"ExperimentalFeatureListParams":{
"properties":{
"cursor":{
@@ -781,6 +796,36 @@
],
"type":"object"
},
"FsUnwatchParams":{
"description":"Stop filesystem watch notifications for a prior `fs/watch`.",
"properties":{
"watchId":{
"description":"Watch identifier returned by `fs/watch`.",
"type":"string"
}
},
"required":[
"watchId"
],
"type":"object"
},
"FsWatchParams":{
"description":"Start filesystem watch notifications for an absolute path.",
"properties":{
"path":{
"allOf":[
{
"$ref":"#/definitions/AbsolutePathBuf"
}
],
"description":"Absolute file or directory path to watch."
}
},
"required":[
"path"
],
"type":"object"
},
"FsWriteFileParams":{
"description":"Write a file on the host filesystem.",
"properties":{
@@ -871,24 +916,6 @@
}
]
},
"FunctionCallOutputPayload":{
"description":"The payload we send back to OpenAI when reporting a tool call result.\n\n`body` serializes directly as the wire value for `function_call_output.output`. `success` remains internal metadata for downstream handling.",
"properties":{
"body":{
"$ref":"#/definitions/FunctionCallOutputBody"
},
"success":{
"type":[
"boolean",
"null"
]
}
},
"required":[
"body"
],
"type":"object"
},
"FuzzyFileSearchParams":{
"properties":{
"cancellationToken":{
@@ -955,15 +982,6 @@
],
"type":"object"
},
"HazelnutScope":{
"enum":[
"example",
"workspace-shared",
"all-shared",
"personal"
],
"type":"string"
},
"ImageDetail":{
"enum":[
"auto",
@@ -1280,6 +1298,10 @@
},
"PluginInstallParams":{
"properties":{
"forceRemoteSync":{
"description":"When true, apply the remote plugin change before the local install flow.",
"type":"boolean"
},
"marketplacePath":{
"$ref":"#/definitions/AbsolutePathBuf"
},
@@ -1329,6 +1351,10 @@
},
"PluginUninstallParams":{
"properties":{
"forceRemoteSync":{
"description":"When true, apply the remote plugin change before the local uninstall flow.",
"description":"Shell command string evaluated by the thread's configured shell. Unlike `command/exec`, this intentionally preserves shell syntax such as pipes, redirects, and quoting. This runs unsandboxed with full access rather than inheriting the thread sandbox policy.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"AccountLoginCompletedNotification":{
"properties":{
"error":{
@@ -83,6 +87,9 @@
],
"type":"object"
},
"AgentPath":{
"type":"string"
},
"AppBranding":{
"description":"EXPERIMENTAL - app metadata returned by app-list APIs.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -535,6 +564,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -744,6 +774,15 @@
],
"type":"object"
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -963,6 +1002,34 @@
],
"type":"object"
},
"FsChangedNotification":{
"description":"Filesystem watch notification emitted for `fs/watch` subscribers.",
"properties":{
"changedPaths":{
"description":"File or directory paths associated with this event.",
"items":{
"$ref":"#/definitions/AbsolutePathBuf"
},
"type":"array"
},
"watchId":{
"description":"Watch identifier returned by `fs/watch`.",
"type":"string"
}
},
"required":[
"changedPaths",
"watchId"
],
"type":"object"
},
"FuzzyFileSearchMatchType":{
"enum":[
"file",
"directory"
],
"type":"string"
},
"FuzzyFileSearchResult":{
"description":"Superset of [`codex_file_search::FileMatch`]",
"properties":{
@@ -980,6 +1047,9 @@
"null"
]
},
"match_type":{
"$ref":"#/definitions/FuzzyFileSearchMatchType"
},
"path":{
"type":"string"
},
@@ -994,6 +1064,7 @@
},
"required":[
"file_name",
"match_type",
"path",
"root",
"score"
@@ -1134,7 +1205,10 @@
},
"HookEventName":{
"enum":[
"preToolUse",
"postToolUse",
"sessionStart",
"userPromptSubmit",
"stop"
],
"type":"string"
@@ -1179,6 +1253,21 @@
],
"type":"string"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"HookRunStatus":{
"enum":[
"running",
@@ -1398,6 +1487,36 @@
],
"type":"object"
},
"McpServerStartupState":{
"enum":[
"starting",
"ready",
"failed",
"cancelled"
],
"type":"string"
},
"McpServerStatusUpdatedNotification":{
"properties":{
"error":{
"type":[
"string",
"null"
]
},
"name":{
"type":"string"
},
"status":{
"$ref":"#/definitions/McpServerStartupState"
}
},
"required":[
"name",
"status"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -1453,6 +1572,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
}
},
"properties":{
"codexHome":{
"allOf":[
{
"$ref":"#/definitions/AbsolutePathBuf"
}
],
"description":"Absolute path to the server's $CODEX_HOME directory."
},
"platformFamily":{
"description":"Platform family for the running app-server target, for example `\"unix\"` or `\"windows\"`.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"description":"Process-wide runtime feature enablement keyed by canonical feature name.\n\nOnly named features are updated. Omitted features are left unchanged. Send an empty map for a no-op.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
}
},
"description":"Filesystem watch notification emitted for `fs/watch` subscribers.",
"properties":{
"changedPaths":{
"description":"File or directory paths associated with this event.",
"items":{
"$ref":"#/definitions/AbsolutePathBuf"
},
"type":"array"
},
"watchId":{
"description":"Watch identifier returned by `fs/watch`.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
}
},
"description":"Start filesystem watch notifications for an absolute path.",
"properties":{
"path":{
"allOf":[
{
"$ref":"#/definitions/AbsolutePathBuf"
}
],
"description":"Absolute file or directory path to watch."
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
}
},
"description":"Created watch handle returned by `fs/watch`.",
"properties":{
"path":{
"allOf":[
{
"$ref":"#/definitions/AbsolutePathBuf"
}
],
"description":"Canonicalized path associated with the watch."
},
"watchId":{
"description":"Connection-scoped watch identifier used for `fs/unwatch` and `fs/changed`.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"description":"The payload we send back to OpenAI when reporting a tool call result.\n\n`body` serializes directly as the wire value for `function_call_output.output`. `success` remains internal metadata for downstream handling.",
"properties":{
"body":{
"$ref":"#/definitions/FunctionCallOutputBody"
},
"success":{
"type":[
"boolean",
"null"
]
}
},
"required":[
"body"
],
"type":"object"
},
"GhostCommit":{
"description":"Details of a ghost commit created from a repository state.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +177,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +313,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -370,6 +402,21 @@
],
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -402,6 +449,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"AgentPath":{
"type":"string"
},
"ApprovalsReviewer":{
"description":"Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -217,6 +242,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -352,6 +378,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -455,6 +490,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -487,6 +537,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +180,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +316,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -393,6 +428,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -425,6 +475,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +180,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +316,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -393,6 +428,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -425,6 +475,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +180,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +316,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -393,6 +428,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -425,6 +475,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"The payload we send back to OpenAI when reporting a tool call result.\n\n`body` serializes directly as the wire value for `function_call_output.output`. `success` remains internal metadata for downstream handling.",
"properties":{
"body":{
"$ref":"#/definitions/FunctionCallOutputBody"
},
"success":{
"type":[
"boolean",
"null"
]
}
},
"required":[
"body"
],
"type":"object"
},
"GhostCommit":{
"description":"Details of a ghost commit created from a repository state.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"AgentPath":{
"type":"string"
},
"ApprovalsReviewer":{
"description":"Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -217,6 +242,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -352,6 +378,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -455,6 +490,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -487,6 +537,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +180,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +316,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -393,6 +428,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -425,6 +475,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Shell command string evaluated by the thread's configured shell. Unlike `command/exec`, this intentionally preserves shell syntax such as pipes, redirects, and quoting. This runs unsandboxed with full access rather than inheriting the thread sandbox policy.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
},
"AgentPath":{
"type":"string"
},
"ApprovalsReviewer":{
"description":"Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -217,6 +242,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -352,6 +378,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -455,6 +490,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -487,6 +537,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +180,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +316,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -393,6 +428,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -425,6 +475,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +180,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +316,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -393,6 +428,21 @@
},
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -425,6 +475,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +177,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +313,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -370,6 +402,21 @@
],
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -402,6 +449,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +177,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +313,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -370,6 +402,21 @@
],
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -402,6 +449,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"description":"Returned when `turn/start` or `turn/steer` is submitted while the current active turn cannot accept same-turn steering, for example `/review` or manual `/compact`.",
"properties":{
"activeTurnNotSteerable":{
"properties":{
"turnKind":{
"$ref":"#/definitions/NonSteerableTurnKind"
}
},
"required":[
"turnKind"
],
"type":"object"
}
},
"required":[
"activeTurnNotSteerable"
],
"title":"ActiveTurnNotSteerableCodexErrorInfo",
"type":"object"
}
]
},
@@ -155,6 +177,7 @@
"enum":[
"pendingInit",
"running",
"interrupted",
"completed",
"errored",
"shutdown",
@@ -290,6 +313,15 @@
}
]
},
"CommandExecutionSource":{
"enum":[
"agent",
"userShell",
"unifiedExecStartup",
"unifiedExecInteraction"
],
"type":"string"
},
"CommandExecutionStatus":{
"enum":[
"inProgress",
@@ -370,6 +402,21 @@
],
"type":"object"
},
"HookPromptFragment":{
"properties":{
"hookRunId":{
"type":"string"
},
"text":{
"type":"string"
}
},
"required":[
"hookRunId",
"text"
],
"type":"object"
},
"McpToolCallError":{
"properties":{
"message":{
@@ -402,6 +449,54 @@
],
"type":"string"
},
"MemoryCitation":{
"properties":{
"entries":{
"items":{
"$ref":"#/definitions/MemoryCitationEntry"
},
"type":"array"
},
"threadIds":{
"items":{
"type":"string"
},
"type":"array"
}
},
"required":[
"entries",
"threadIds"
],
"type":"object"
},
"MemoryCitationEntry":{
"properties":{
"lineEnd":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"lineStart":{
"format":"uint32",
"minimum":0.0,
"type":"integer"
},
"note":{
"type":"string"
},
"path":{
"type":"string"
}
},
"required":[
"lineEnd",
"lineStart",
"note",
"path"
],
"type":"object"
},
"MessagePhase":{
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.