## TL;DR
Fix duplicated reasoning summaries in `tui_app_server`.
<img width="1716" height="912" alt="image"
src="https://github.com/user-attachments/assets/6362f25a-ab1c-4a01-bf10-b5616c9428c2"
/>
During live turns, reasoning text is already rendered incrementally from
`ReasoningSummaryTextDelta`. When the same reasoning item later arrives
via `ItemCompleted`, we should only finalize the reasoning block, not
render the same summary again.
## What changed
- only replay rendered reasoning summaries from completed
`ThreadItem::Reasoning` items
- kept live completed reasoning items as finalize-only
- added a regression test covering the live streaming + completion path
## Why
Without this, the first reasoning summary often appears twice in the
transcript when `model_reasoning_summary = "detailed"` and
`features.tui_app_server = true`.
Migrate `cwd` and related session/config state to `AbsolutePathBuf` so
downstream consumers consistently see absolute working directories.
Add test-only `.abs()` helpers for `Path`, `PathBuf`, and `TempDir`, and
update branch-local tests to use them instead of
`AbsolutePathBuf::try_from(...)`.
For the remaining TUI/app-server snapshot coverage that renders absolute
cwd values, keep the snapshots unchanged and skip the Windows-only cases
where the platform-specific absolute path layout differs.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
---------
Co-authored-by: Codex <noreply@openai.com>
- Changed `requires_mcp_tool_approval` to apply MCP spec defaults when
annotations are missing.
- Unannotated tools now default to:
- `readOnlyHint = false`
- `destructiveHint = true`
- `openWorldHint = true`
- This means unannotated MCP tools now go through approval/ARC
monitoring instead of silently bypassing it.
- Explicitly read-only tools still skip approval unless they are also
explicitly marked destructive.
**Previous behavior**
Failed open for missing annotations, which was unsafe for custom MCP
tools that omitted or forgot annotations.
---------
Co-authored-by: colby-oai <228809017+colby-oai@users.noreply.github.com>
This PR adds code to recover from a narrow app-server timing race where
a follow-up can be sent after the previous turn has already ended but
before the TUI has observed that completion.
Instead of surfacing turn/steer failed: no active turn to steer, the
client now treats that as a stale active-turn cache and falls back to
starting a fresh turn, matching the intended submit behavior more
closely. This is similar to the strategy employed by other app server
clients (notably, the IDE extension and desktop app).
This race exists because the current app-server API makes the client
choose between two separate RPCs, turn/steer and turn/start, based on
its local view of whether a turn is still active. That view is
replicated from asynchronous notifications, so it can be stale for a
brief window. The server may already have ended the turn while the
client still believes it is in progress. Since the choice is made
client-side rather than atomically on the server, tui_app_server can
occasionally send turn/steer for a turn that no longer exists.
## Summary
- keep legacy Windows restricted-token sandboxing as the supported
baseline
- support the split-policy subset that restricted-token can enforce
directly today
- support full-disk read, the same writable root set as legacy
`WorkspaceWrite`, and extra read-only carveouts under those writable
roots via additional deny-write ACLs
- continue to fail closed for unsupported split-only shapes, including
explicit unreadable (`none`) carveouts, reopened writable descendants
under read-only carveouts, and writable root sets that do not match the
legacy workspace roots
## Example
Given a filesystem policy like:
```toml
":root" = "read"
":cwd" = "write"
"./docs" = "read"
```
the restricted-token backend can keep the workspace writable while
denying writes under `docs` by layering an extra deny-write carveout on
top of the legacy workspace-write roots.
A policy like:
```toml
"/workspace" = "write"
"/workspace/docs" = "read"
"/workspace/docs/tmp" = "write"
```
still fails closed, because the unelevated backend cannot reopen the
nested writable descendant safely.
## Stack
-> fix: support split carveouts in windows restricted-token sandbox
#14172
fix: support split carveouts in windows elevated sandbox #14568
## Summary
- remove the fork-startup `build_initial_context` injection
- keep the reconstructed `reference_context_item` as the fork baseline
until the first real turn
- update fork-history tests and the request snapshot, and add a
`TODO(ccunningham)` for remaining nondiffable initial-context inputs
## Why
Fork startup was appending current-session initial context immediately
after reconstructing the parent rollout, then the first real turn could
emit context updates again. That duplicated model-visible context in the
child rollout.
## Impact
Forked sessions now behave like resume for context seeding: startup
reconstructs history and preserves the prior baseline, and the first
real turn handles any current-session context emission.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Reuse the existing config path resolver for the macOS MDM managed
preferences layer so `writable_roots = ["~/code"]` expands the same way
as file-backed config
- keep the change scoped to the MDM branch in `config_loader`; the
current net diff is only `config_loader/mod.rs` plus focused regression
tests in `config_loader/tests.rs` and `config/service_tests.rs`
- research note: `resolve_relative_paths_in_config_toml(...)` is already
used in several existing configuration paths, including [CLI
overrides](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L152-L163)),
[file-backed managed
config](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L274-L285)),
[normal config-file
loading](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L311-L331)),
[project `.codex/config.toml`
loading](74fda242d3/codex-rs/core/src/config_loader/mod.rs (L863-L865)),
and [role config
loading](74fda242d3/codex-rs/core/src/agent/role.rs (L105-L109))
## Validation
- `cargo fmt --all --check`
- `cargo test -p codex-core
managed_preferences_expand_home_directory_in_workspace_write_roots --
--nocapture`
- `cargo test -p codex-core
write_value_succeeds_when_managed_preferences_expand_home_directory_paths
-- --nocapture`
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
Co-authored-by: Michael Bolin <bolinfest@gmail.com>
- Removes provenance filtering in the mentions feature for apps and
skills that were installed as part of a plugin.
- All skills and apps for a plugin are mentionable with this change.
## Why
This is a follow-up to #15360. That change fixed the `arg0` helper
setup, but `rmcp-client` still coerced stdio transport environment
values into UTF-8 `String`s before program resolution and process spawn.
If `PATH` or another inherited environment value contains non-UTF-8
bytes, that loses fidelity before it reaches `which` and `Command`.
## What changed
- change `create_env_for_mcp_server()` to return `HashMap<OsString,
OsString>` and read inherited values with `std::env::var_os()`
- change `TransportRecipe::Stdio.env`, `RmcpClient::new_stdio_client()`,
and `program_resolver::resolve()` to keep stdio transport env values in
`OsString` form within `rmcp-client`
- keep the `codex-core` config boundary stringly, but convert configured
stdio env values to `OsString` once when constructing the transport
- update the rmcp-client stdio test fixtures and callers to use
`OsString` env maps
- add a Unix regression test that verifies `create_env_for_mcp_server()`
preserves a non-UTF-8 `PATH`
## How to verify
- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-core mcp_connection_manager`
- `just argument-comment-lint`
Targeted coverage in this change includes
`utils::tests::create_env_preserves_path_when_it_is_not_utf8`, while the
updated stdio transport path is exercised by the existing rmcp-client
tests that construct `RmcpClient::new_stdio_client()`.
### Summary
Add the v2 app-server filesystem watch RPCs and notifications, wire them
through the message processor, and implement connection-scoped watches
with notify-backed change delivery. This also updates the schema
fixtures, app-server documentation, and the v2 integration coverage for
watch and unwatch behavior.
This allows clients to efficiently watch for filesystem updates, e.g. to
react on branch changes.
### Testing
- exercise watch lifecycles for directory changes, atomic file
replacement, missing-file targets, and unwatch cleanup
- move the shared byte-based middle truncation logic from `core` into
`codex-utils-string`
- keep token-specific truncation in `codex-core` so rollout can reuse
the shared helper in the next stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- drop `sandbox_permissions` from the sandboxing `ExecOptions` and
`ExecRequest` adapter types
- remove the now-unused plumbing from shell, unified exec, JS REPL, and
apply-patch runtime call sites
- default reconstructed `ExecParams` to `SandboxPermissions::UseDefault`
where the lower-level API still requires the field
## Testing
- `just fmt`
- `just argument-comment-lint`
- `cargo test -p codex-core` (still running locally; first failures
observed in `suite::cli_stream::responses_mode_stream_cli`,
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`,
and
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_env_fallback`)
Switch plugin-install background MCP OAuth to a silent login path so the
raw authorization URL is no longer printed in normal success cases.
OAuth behavior is otherwise unchanged, with fallback URL output via
stderr still shown only if browser launch fails.
Before:
https://github.com/user-attachments/assets/4bf387af-afa8-4b83-bcd6-4ca6b55da8db
## Summary
Fixes slow `Ctrl+C` exit from the ChatGPT browser-login screen in
`tui_app_server`.
## Root cause
Onboarding-level `Ctrl+C` quit bypassed the auth widget's cancel path.
That let the active ChatGPT login keep running, and in-process
app-server shutdown then waited on the stale login attempt before
finishing.
## Changes
- Extract a shared `cancel_active_attempt()` path in the auth widget
- Use that path from onboarding-level `Ctrl+C` before exiting the TUI
- Add focused tests for canceling browser-login and device-code attempts
- Add app-server shutdown cleanup that explicitly drops any active login
before draining background work
## Summary
Fixes ChatGPT login in `tui_app_server` so the local browser opens again
during in-process login flows.
## Root cause
The app-server backend intentionally starts ChatGPT login with browser
auto-open disabled, expecting the TUI client to open the returned
`auth_url`. The app-server TUI was not doing that, so the login URL was
shown in the UI but no browser window opened.
## Changes
- Add a helper that opens the returned ChatGPT login URL locally
- Call it from the main ChatGPT login flow
- Call it from the device-code fallback-to-browser path as well
- Limit auto-open to in-process app-server handles so remote sessions do
not try to open a browser against a remote localhost callback
## Summary
Fixes early TUI exit paths that could leave the terminal in a dirty
state and cause a stray `%` prompt marker after the app quit.
## Root cause
Both `tui` and `tui_app_server` had early returns after `tui::init()`
that did not guarantee terminal restore. When that happened, shells like
`zsh` inherited the altered terminal state.
## Changes
- Add a restore guard around `run_ratatui_app()` in both `tui` and
`tui_app_server`
- Route early exits through the guard instead of relying on scattered
manual restore calls
- Ensure terminal restore still happens on normal shutdown
- Remove marketplace from left column.
- Change `Can be installed` to `Available`
- Align right-column marketplace + selected-row hint text across states.
- Changes applied to both `tui` and `tui_app_server`.
- Update related snapshots/tests.
<img width="2142" height="590" alt="image"
src="https://github.com/user-attachments/assets/6e60b783-2bea-46d4-b353-f2fd328ac4d0"
/>
- create `codex-git-utils` and move the shared git helpers into it with
file moves preserved for diff readability
- move the `GitInfo` helpers out of `core` so stacked rollout work can
depend on the shared crate without carrying its own git info module
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fixes a `tui_app_server` bootstrap failure when launching the CLI while
logged out.
## Root cause
During TUI bootstrap, `tui_app_server` fetched `account/rateLimits/read`
unconditionally and treated failures as fatal. When the user was logged
out, there was no ChatGPT account available, so that RPC failed and
aborted startup with:
```
Error: account/rateLimits/read failed during TUI bootstrap
```
## Changes
- Only fetch bootstrap rate limits when OpenAI auth is required and a
ChatGPT account is present
- Treat bootstrap rate-limit fetch failures as non-fatal and fall back
to empty snapshots
- Log the fetch failure at debug level instead of aborting startup
## Why
`shell-tool-mcp` and the Bash fork are no longer needed, but the patched
zsh fork is still relevant for shell escalation and for the
DotSlash-backed zsh-fork integration tests.
Deleting the old `shell-tool-mcp` workflow also deleted the only
pipeline that rebuilt those patched zsh binaries. This keeps the package
removal, while preserving a small release path that can be reused
whenever `codex-rs/shell-escalation/patches/zsh-exec-wrapper.patch`
changes.
## What changed
- removed the `shell-tool-mcp` workspace package, its npm
packaging/release jobs, the Bash test fixture, and the remaining
Bash-specific compatibility wiring
- deleted the old `.github/workflows/shell-tool-mcp.yml` and
`.github/workflows/shell-tool-mcp-ci.yml` workflows now that their
responsibilities have been replaced or removed
- kept the zsh patch under
`codex-rs/shell-escalation/patches/zsh-exec-wrapper.patch` and updated
the `codex-rs/shell-escalation` docs/code to describe the zsh-based flow
directly
- added `.github/workflows/rust-release-zsh.yml` to build only the three
zsh binaries that `codex-rs/app-server/tests/suite/zsh` needs today:
- `aarch64-apple-darwin` on `macos-15`
- `x86_64-unknown-linux-musl` on `ubuntu-24.04`
- `aarch64-unknown-linux-musl` on `ubuntu-24.04`
- extracted the shared zsh build/smoke-test/stage logic into
`.github/scripts/build-zsh-release-artifact.sh`, made that helper
directly executable, and now invoke it directly from the workflow so the
Linux and macOS jobs only keep the OS-specific setup in YAML
- wired those standalone `codex-zsh-*.tar.gz` assets into
`rust-release.yml` and added `.github/dotslash-zsh-config.json` so
releases also publish a `codex-zsh` DotSlash file
- updated the checked-in `codex-rs/app-server/tests/suite/zsh` fixture
comments to explain that new releases come from the standalone zsh
assets, while the checked-in fixture remains pinned to the latest
historical release until a newer zsh artifact is published
- tightened a couple of follow-on cleanups in
`codex-rs/shell-escalation`: the `ExecParams::command` comment now
describes the shell `-c`/`-lc` string more clearly, and the README now
points at the same `git.code.sf.net` zsh source URL that the workflow
uses
## Testing
- `cargo test -p codex-shell-escalation`
- `just argument-comment-lint`
- `bash -n .github/scripts/build-zsh-release-artifact.sh`
- attempted `cargo test -p codex-core`; unrelated existing failures
remain, but the touched `tools::runtimes::shell::unix_escalation::*`
coverage passed during that run
- Remove numeric prefixes for disabled rows in shared list rendering.
These numbers are shortcuts, Ex: Pressing "2" selects option `#2`.
Disabled items can not be selected, so keeping numbers on these items is
misleading.
- Apply the same behavior in both tui and tui_app_server.
- Update affected snapshots for apps/plugins loading and plugin detail
rows.
_**This is a global change.**_
Before:
<img width="1680" height="488" alt="image"
src="https://github.com/user-attachments/assets/4bcf94ad-285f-48d3-a235-a85b58ee58e2"
/>
After:
<img width="1706" height="484" alt="image"
src="https://github.com/user-attachments/assets/76bb6107-a562-42fe-ae94-29440447ca77"
/>
## Summary
- trim contiguous developer/contextual-user pre-turn updates when
rollback cuts back to a user turn
- add a focused history regression test for the trim behavior
- update the rollback request-boundary snapshots to show the fixed
non-duplicating context shape
---------
Co-authored-by: Codex <noreply@openai.com>
built from #14256. PR description from @etraut-openai:
This PR addresses a hole in [PR
11802](https://github.com/openai/codex/pull/11802). The previous PR
assumed that app server clients would respond to token refresh failures
by presenting the user with an error ("you must log in again") and then
not making further attempts to call network endpoints using the expired
token. While they do present the user with this error, they don't
prevent further attempts to call network endpoints and can repeatedly
call `getAuthStatus(refreshToken=true)` resulting in many failed calls
to the token refresh endpoint.
There are three solutions I considered here:
1. Change the getAuthStatus app server call to return a null auth if the
caller specified "refreshToken" on input and the refresh attempt fails.
This will cause clients to immediately log out the user and return them
to the log in screen. This is a really bad user experience. It's also a
breaking change in the app server contract that could break third-party
clients.
2. Augment the getAuthStatus app server call to return an additional
field that indicates the state of "token could not be refreshed". This
is a non-breaking change to the app server API, but it requires
non-trivial changes for all clients to properly handle this new field
properly.
3. Change the getAuthStatus implementation to handle the case where a
token refresh fails by marking the AuthManager's in-memory access and
refresh tokens as "poisoned" so it they are no longer used. This is the
simplest fix that requires no client changes.
I chose option 3.
Here's Codex's explanation of this change:
When an app-server client asks `getAuthStatus(refreshToken=true)`, we
may try to refresh a stale ChatGPT access token. If that refresh fails
permanently (for example `refresh_token_reused`, expired, or revoked),
the old behavior was bad in two ways:
1. We kept the in-memory auth snapshot alive as if it were still usable.
2. Later auth checks could retry refresh again and again, creating a
storm of doomed `/oauth/token` requests and repeatedly surfacing the
same failure.
This is especially painful for app-server clients because they poll auth
status and can keep driving the refresh path without any real chance of
recovery.
This change makes permanent refresh failures terminal for the current
managed auth snapshot without changing the app-server API contract.
What changed:
- `AuthManager` now poisons the current managed auth snapshot in memory
after a permanent refresh failure, keyed to the unchanged `AuthDotJson`.
- Once poisoned, later refresh attempts for that same snapshot fail fast
locally without calling the auth service again.
- The poison is cleared automatically when auth materially changes, such
as a new login, logout, or reload of different auth state from storage.
- `getAuthStatus(includeToken=true)` now omits `authToken` after a
permanent refresh failure instead of handing out the stale cached bearer
token.
This keeps the current auth method visible to clients, avoids forcing an
immediate logout flow, and stops repeated refresh attempts for
credentials that cannot recover.
---------
Co-authored-by: Eric Traut <etraut@openai.com>
Follow up to #15357 by making proactive ChatGPT auth refresh depend on
the access token's JWT expiration instead of treating `last_refresh` age
as the primary source of truth.
* Add
`OutgoingMessageSender::send_server_notification_to_connection_and_wait`
which returns only once message is written to websocket (or failed to do
so)
* Use this mechanism to apply back pressure to stdout/stderr streams of
processes spawned by `command/exec`, to limit them to at most one
message in-memory at a time
* Use back pressure signal to also batch smaller chunks into ≈64KiB ones
This should make commands execution more robust over
high-latency/low-throughput networks
### Summary
Make `FileWatcher` a reusable core component which can be built upon.
Extract skills-related logic into a separate `SkillWatcher`.
Introduce a composable `ThrottledWatchReceiver` to throttle filesystem
events, coalescing affected paths among them.
### Testing
Updated existing unit tests.
- Prefer plugin manifest `interface.displayName` for plugin labels.
- Preserve plugin provenance when handling `list_mcp_tools` so connector
`plugin_display_names` are not clobbered.
- Add a TUI test to ensure plugin-owned app mentions are deduped
correctly.
## Summary
- replace the second-compaction test fixtures with a single ordered
`/responses` sequence
- assert against the real recorded request order instead of aggregating
per-mock captures
- realign the second-summary assertion to the first post-compaction user
turn where the summary actually appears
## Root cause
`compact_resume_after_second_compaction_preserves_history` collected
requests from multiple `mount_sse_once_match` recorders. Overlapping
matchers could record the same HTTP request more than once, so the test
indexed into a duplicated synthetic list rather than the true request
stream. That made the summary assertion depend on matcher evaluation
order and platform-specific behavior.
## Impact
- makes the flaky test deterministic by removing duplicate request
capture from the assertion path
- keeps the change scoped to the test only
## Validation
- `just fmt`
- `just argument-comment-lint`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- repeated the same targeted test 10 times
---------
Co-authored-by: Codex <noreply@openai.com>
Make the inter-agent communication start a turn
As part of this, we disable the v2 notifier to prevent some odd
behaviour where the agent restart working while you're talking to it for
example