## Summary
Stop counting elicitation time towards mcp tool call time. There are
some tradeoffs here, but in general I don't think time spent waiting for
elicitations should count towards tool call time, or at least not
directly towards timeouts.
Elicitations are not exactly like exec_command escalation requests, but
I would argue it's ~roughly equivalent.
## Testing
- [x] Added unit tests
- [x] Tested locally
Addresses #17498
Problem: The TUI derived /status instruction source paths from the local
client environment, which could show stale <none> output or incorrect
paths when connected to a remote app server.
Solution: Add an app-server v2 instructionSources snapshot to thread
start/resume/fork responses, default it to an empty list when older
servers omit it, and render TUI /status from that server-provided
session data.
Additional context: The app-server field is intentionally named
instructionSources rather than AGENTS.md-specific terminology because
the loaded instruction sources can include global instructions, project
AGENTS.md files, AGENTS.override.md, user-defined instruction files, and
future dynamic sources.
Addresses #17313
Problem: The visual context meter in the status line was confusing and
continued to draw negative feedback, and context reporting should remain
an explicit opt-in rather than part of the default footer.
Solution: Remove the visual meter, restore opt-in context remaining/used
percentage items that explicitly say "Context", keep existing
context-usage configs working as a hidden alias, and update the setup
text and snapshots.
## Problem
The TUI had shell-style Up/Down history recall, but `Ctrl+R` did not
provide the reverse incremental search workflow users expect from
shells. Users needed a way to search older prompts without immediately
replacing the current draft, and the interaction needed to handle async
persistent history, repeated navigation keys, duplicate prompt text,
footer hints, and preview highlighting without making the main composer
file even harder to review.
https://github.com/user-attachments/assets/5165affd-4c9a-46e9-adbd-89088f5f7b6b
<img width="1227" height="722" alt="image"
src="https://github.com/user-attachments/assets/8bc83289-eeca-47c7-b0c3-8975101901af"
/>
## Mental model
`Ctrl+R` opens a temporary search session owned by the composer. The
footer line becomes the search input, the composer body previews the
current match only after the query has text, and `Enter` accepts that
preview as an editable draft while `Esc` restores the draft that existed
before search started. The history layer provides a combined offset
space over persistent and local history, but search navigation exposes
unique prompt text rather than every physical history row.
## Non-goals
This change does not rewrite stored history, change normal Up/Down
browsing semantics, add fuzzy matching, or add persistent metadata for
attachments in cross-session history. Search deduplication is
deliberately scoped to the active Ctrl+R search session and uses exact
prompt text, so case, whitespace, punctuation, and attachment-only
differences are not normalized.
## Tradeoffs
The implementation keeps search state in the existing composer and
history state machines instead of adding a new cross-module controller.
That keeps ownership local and testable, but it means the composer still
coordinates visible search status, draft restoration, footer rendering,
cursor placement, and match highlighting while `ChatComposerHistory`
owns traversal, async fetch continuation, boundary clamping, and
unique-result caching. Unique-result caching stores cloned
`HistoryEntry` values so known matches can be revisited without cache
lookups; this is simple and robust for interactive search sizes, but it
is not a global history index.
## Architecture
`ChatComposer` detects `Ctrl+R`, snapshots the current draft, switches
the footer to `FooterMode::HistorySearch`, and routes search-mode keys
before normal editing. Query edits call `ChatComposerHistory::search`
with `restart = true`, which starts from the newest combined-history
offset. Repeated `Ctrl+R` or Up searches older; Down searches newer
through already discovered unique matches or continues the scan.
Persistent history entries still arrive asynchronously through
`on_entry_response`, where a pending search either accepts the response,
skips a duplicate, or requests the next offset.
The composer-facing pieces now live in
`codex-rs/tui/src/bottom_pane/chat_composer/history_search.rs`, leaving
`chat_composer.rs` responsible for routing and rendering integration
instead of owning every search helper inline.
`codex-rs/tui/src/bottom_pane/chat_composer_history.rs` remains the
owner of stored history, combined offsets, async fetch state, boundary
semantics, and duplicate suppression. Match highlighting is computed
from the current composer text while search is active and disappears
when the match is accepted.
## Observability
There are no new logs or telemetry. The practical debug path is state
inspection: `ChatComposer.history_search` tells whether the footer query
is idle, searching, matched, or unmatched; `ChatComposerHistory.search`
tracks selected raw offsets, pending persistent fetches, exhausted
directions, and unique match cache state. If a user reports skipped or
repeated results, first inspect the exact stored prompt text, the
selected offset, whether an async persistent response is still pending,
and whether a query edit restarted the search session.
## Tests
The change is covered by focused `codex-tui` unit tests for opening
search without previewing the latest entry, accepting and canceling
search, no-match restoration, boundary clamping, footer hints,
case-insensitive highlighting, local duplicate skipping, and persistent
duplicate skipping through async responses. Snapshot coverage captures
the footer-mode visual changes. Local verification used `just fmt`,
`cargo test -p codex-tui history_search`, `cargo test -p codex-tui`, and
`just fix -p codex-tui`.
- Let typed user messages submit while realtime is active and mirror
accepted text into the realtime text stream.
- Add integration coverage and snapshot for outbound realtime text.
## Summary
- detect WSL1 before Codex probes or invokes the Linux bubblewrap
sandbox
- fail early with a clear unsupported-operation message when a command
would require bubblewrap on WSL1
- document that WSL2 follows the normal Linux bubblewrap path while WSL1
is unsupported
## Why
Codex 0.115.0 made bubblewrap the default Linux sandbox. WSL1 cannot
create the user namespaces that bubblewrap needs, so shell commands
currently fail later with a raw bwrap namespace error. This makes the
unsupported environment explicit and keeps non-bubblewrap paths
unchanged.
The WSL detection reads /proc/version, lets an explicit WSL<version>
marker decide WSL1 vs WSL2+, and only treats a bare Microsoft marker as
WSL1 when no explicit WSL version is present.
addresses https://github.com/openai/codex/issues/16076
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- register flattened handler aliases for deferred MCP tools
- cover the node_repl-shaped deferred MCP call path in tool registry
tests
## Root Cause
Deferred MCP tools were registered only under their namespaced handler
key, e.g. `mcp__node_repl__:js`. If the model/bridge emitted the
flattened qualified name `mcp__node_repl__js`, core parsed it as an MCP
payload but dispatch looked up the flattened handler key and returned
`unsupported call` before reaching the MCP handler.
## Validation
- `just fmt`
- `cargo test -p codex-tools
search_tool_registers_deferred_mcp_flattened_handlers`
- `cargo test -p codex-core
search_tool_registers_namespaced_mcp_tool_aliases`
- `git diff --check`
Select Current Thread startup context by budget from newest turns, cap
each rendered turn at 300 approximate tokens, and add formatter plus
integration snapshot coverage.
## Summary
- update the guardian timeout guidance to say permission approval review
timed out
- simplify the retry guidance to say retry once or ask the user for
guidance or explicit approval
## Testing
- cargo test -p codex-core
guardian_timeout_message_distinguishes_timeout_from_policy_denial
- cargo test -p codex-core
guardian_review_decision_maps_to_mcp_tool_decision
**Summary**
This PR treats Guardian timeouts as distinct from explicit denials in
the core approval paths.
Timeouts now return timeout-specific guidance instead of Guardian
policy-rejection messaging.
It updates the command, shell, network, and MCP approval flows and adds
focused test coverage.
Addresses #17303
Problem: The standalone codex-tui entrypoint only printed token usage on
exit, so resumable sessions could omit the codex resume footer even when
thread metadata was available.
Solution: Format codex-tui exit output from AppExitInfo so it includes
the same resume hint as the main CLI and reports fatal exits
consistently.
Addresses #17311
Problem: `/stop` stops background terminals, but `/ps` can still show
stale entries because the TUI process cache is cleared only after later
exec end events arrive.
Solution: Clear the TUI's tracked unified exec process list and footer
immediately when `/stop` submits background terminal cleanup.
Addresses #17353
Problem: Codex rate-limit fetching failed when the backend returned the
new `prolite` subscription plan type.
Solution: Add `prolite` to the backend/account/auth plan mappings, keep
unknown WHAM plan values decodable, and regenerate app-server plan
schemas.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
Addresses #17276
Problem: Closing the terminal while the TUI input stream is pending
could leave the app outside the normal shutdown path, which is risky
when an approval prompt is active.
Solution: Treat a closed TUI input stream as ShutdownFirst so existing
thread shutdown behavior cancels pending work and approvals before exit.
# TL;DR
- Adds recognized slash commands to the TUI's local in-session recall
history.
- This is the MVP of the whole feature: it keeps slash-command recall
local only: nothing is written to persistent history, app-server
history, or core history storage.
- Treats slash commands like submitted text once they parse as a known
built-in command, regardless of whether command dispatch later succeeds.
# Problem
Slash commands are handled outside the normal message submission path,
so they could clear the composer without becoming part of the local
Up-arrow recall list. That made command-heavy workflows awkward: after
running `/diff`, `/rename Better title`, `/plan investigate this`, or
even a valid command that reports a usage error, users had to retype the
command instead of recalling and editing it like a normal prompt.
The goal of this PR is to make slash commands feel like submitted input
inside the current TUI session while keeping the change deliberately
local. This is not persistent history yet; it only affects the
composer's in-memory recall behavior.
# Mental model
The composer owns draft state and local recall. When slash input parses
as a recognized built-in command, the composer stages the submitted
command text before returning `InputResult::Command` or
`InputResult::CommandWithArgs`. `ChatWidget` then dispatches the command
and records the staged entry once dispatch returns to the input-result
path.
Command-name recognition is the only validation before local recall. A
valid slash command is recallable whether it succeeds, fails with a
usage error, no-ops, is unavailable while a task is running, or is
skipped by command-specific logic. An unrecognized slash command is
different: it is restored as a draft, surfaces the existing
unrecognized-command message, and is not added to recall.
Bare commands recalled from typed text use the trimmed submitted draft.
Commands selected from the popup record the canonical command text, such
as `/diff`, rather than the partial filter text the user typed. Inline
commands with arguments keep the original command invocation available
locally even when their arguments are later prepared through the normal
submission pipeline.
# Non-goals
Persisting slash commands across sessions is intentionally out of scope.
This change does not modify app-server history, core history storage,
protocol events, or message submission semantics.
This does not change command availability, command side effects, popup
filtering, command parsing, or the semantics of unsupported commands. It
only changes whether recognized slash-command invocations are available
through local Up-arrow recall after the user submits them.
# Tradeoffs
The main tradeoff is that recall is based on command recognition, not
command outcome. This intentionally favors a simpler user model: if the
TUI accepted the input as a slash command, the user can recall and edit
that input just like plain text. That means valid-but-unsuccessful
invocations such as usage errors are recallable, which is useful when
the next action is usually to edit and retry.
The previous accept/reject design required command dispatch to report a
boolean outcome, which made the dispatcher API noisier and forced every
branch to decide history behavior. This version keeps the dispatch APIs
as side-effect-only methods and localizes history recording to the
slash-command input path.
Inline command handling still avoids double-recording by preparing
inline arguments without using the normal message-submission history
path. The staged slash-command entry remains the single local recall
record for the command invocation.
# Architecture
`ChatComposer` stages a pending `HistoryEntry` when recognized
slash-command input is promoted into an input result. The pending entry
mirrors the existing local history payload shape so recall can restore
text elements, local images, remote images, mention bindings, and
pending paste state when those are present.
`BottomPane` exposes a narrow method for recording that staged command
entry because it owns the composer. `ChatWidget` records the staged
entry after dispatching a recognized command from the input-result
match. Valid commands rejected before they reach `ChatWidget`, such as
commands unavailable while a task is running, are staged and recorded in
the composer path that detects the rejection.
Slash-command dispatch itself now lives in
`chatwidget/slash_dispatch.rs` so the behavior is reviewable without
adding more weight to `chatwidget.rs`. The extraction is
behavior-preserving: the dispatch match arms stay intact, while the
input flow in `chatwidget.rs` remains the single place that connects
submitted slash-command input to dispatch.
# Observability
There is no new logging because this is a local UI recall behavior and
the result is directly visible through Up-arrow recall. The practical
debug path is to trace Enter through
`ChatComposer::try_dispatch_bare_slash_command`,
`ChatComposer::try_dispatch_slash_command_with_args`, or popup Enter/Tab
handling, then confirm the recognized command is staged before dispatch
and recorded exactly once afterward.
If a valid command unexpectedly does not appear in recall, check whether
the input path staged slash history before clearing the composer and
whether it used the `ChatWidget` slash-dispatch wrapper. If an
unrecognized command unexpectedly appears in recall, check the parser
branch that should restore the draft instead of staging history.
# Tests
Composer-level tests cover staging and recording for a bare typed slash
command, a popup-selected command, and an inline command with arguments.
Chat-widget tests cover valid commands being recallable after normal
dispatch, inline dispatch, usage errors, task-running unavailability,
no-op stub dispatch, and command-specific skip behavior such as `/init`
when an instructions file already exists. They also cover the negative
case: unrecognized slash commands are not added to local recall.
## Summary
- Add an optional `tags` dictionary to feedback upload params.
- Capture the active app-server turn id in the TUI and submit it as
`tags.turn_id` with `/feedback` uploads.
- Merge client-provided feedback tags into Sentry feedback tags while
preserving reserved system fields like `thread_id`, `classification`,
`cli_version`, `session_source`, and `reason`.
## Behavior / impact
Existing feedback upload callers remain compatible because `tags` is
optional and nullable. The wire shape is still a normal JSON object /
TypeScript dictionary, so adding future feedback metadata will not
require a new top-level protocol field each time. This change only adds
feedback metadata for Codex CLI/TUI uploads; it does not affect existing
pipelines, DAGs, exports, or downstream consumers unless they choose to
read the new `turn_id` feedback tag.
## Tests
- `cargo fmt -- --config imports_granularity=Item` passed; stable
rustfmt warned that `imports_granularity` is nightly-only.
- `cargo run -p codex-app-server-protocol --bin write_schema_fixtures`
- `cargo test -p codex-feedback
upload_tags_include_client_tags_and_preserve_reserved_fields`
- `cargo test -p codex-app-server-protocol
schema_fixtures_match_generated`
- `cargo test -p codex-tui build_feedback_upload_params`
- `cargo test -p codex-tui
live_app_server_turn_started_sets_feedback_turn_id`
- `cargo check -p codex-app-server --tests`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
Addresses #17302
Problem: `thread/list` compared cwd filters with raw path equality, so
`resume --last` could miss Windows sessions when the saved cwd used a
verbatim path form and the current cwd did not.
Solution: Normalize cwd comparisons through the existing path comparison
utilities before falling back to direct equality, and add Windows
regression coverage for verbatim paths. I made this a general utility
function and replaced all of the duplicated instance of it across the
code base.
## Summary
- Update the marketplace add local-source integration test to pass an
explicit relative local path.
- Keep the change test-only; no CLI source parsing behavior changes.
## Tests
- cargo fmt -p codex-cli
- cargo test -p codex-cli --test marketplace_add
## Impact
- Production behavior is unchanged.
- No impact to feedback upload logic, DAGs, exports, or downstream
pipelines.
Co-authored-by: Codex <noreply@openai.com>
## Summary
- keep hostname targets proxied by default by removing hostname suffixes
from the managed `NO_PROXY` value while preserving private/link-local
CIDRs
- make the macOS `allow_local_binding` sandbox rules match the local
socket shape used by DNS tools by allowing wildcard local binds
- allow raw DNS egress to remote port 53 only when `allow_local_binding`
is enabled, without opening blanket outbound network access
## Root cause
Raw DNS tools do not honor `HTTP_PROXY` or `ALL_PROXY`, so the
proxy-only Seatbelt policy blocked their resolver traffic before it
could reach host DNS. In the affected managed config,
`allow_local_binding = true`, but the existing rule only allowed
`localhost:*` binds; `dig`/BIND can bind sockets in a way that needs
wildcard local binding. Separately, hostname suffixes in `NO_PROXY`
could force internal hostnames to resolve locally instead of through the
proxy path.
---------
Co-authored-by: Codex <noreply@openai.com>
Problem: The TUI still depended on `codex-core` directly in a number of
places, and we had no enforcement from keeping this problem from getting
worse.
Solution: Route TUI core access through
`codex-app-server-client::legacy_core`, add CI enforcement for that
boundary, and re-export this legacy bridge inside the TUI as
`crate::legacy_core` so the remaining call sites stay readable. There is
no functional change in this PR — just changes to import targets.
Over time, we can whittle away at the remaining symbols in this legacy
namespace with the eventual goal of removing them all. In the meantime,
this linter rule will prevent us from inadvertently importing new
symbols from core.
## Summary
- Add `TimedOut` to Guardian/review carrier types:
- `ReviewDecision::TimedOut`
- `GuardianAssessmentStatus::TimedOut`
- app-server v2 `GuardianApprovalReviewStatus::TimedOut`
- Regenerate app-server JSON/TypeScript schemas for the new wire shape.
- Wire the new status through core/app-server/TUI mappings with
conservative fail-closed handling.
- Keep `TimedOut` non-user-selectable in the approval UI.
**Does not change runtime behavior yet; emitting `TimeOut` and
parent-model timeout messaging will come in followup PRs**
Problem: The Windows exec-server test command could let separator
whitespace become part of `echo` output, making the exact
retained-output assertion flaky.
Solution: Tighten the Windows `cmd.exe` command by placing command
separators directly after the echoed tokens so stdout remains
deterministic while preserving the exact assertion.
Added a new top-level `codex marketplace add` command for installing
plugin marketplaces into Codex’s local marketplace cache.
This change adds source parsing for local directories, GitHub shorthand,
and git URLs, supports optional `--ref` and git-only `--sparse` checkout
paths, stages the source in a temp directory, validates the marketplace
manifest, and installs it under
`$CODEX_HOME/marketplaces/<marketplace-name>`
Included tests cover local install behavior in the CLI and marketplace
discovery from installed roots in core. Scoped formatting and fix passes
were run, and targeted CLI/core tests passed.
## Summary
- preserve logical symlink paths during permission normalization and
config cwd handling
- bind real targets for symlinked readable/writable roots in bwrap and
remap carveouts and unreadable roots there
- add regressions for symlinked carveouts and nested symlink escape
masking
## Root cause
Permission normalization canonicalized symlinked writable roots and cwd
to their real targets too early. That drifted policy checks away from
the logical paths the sandboxed process can actually address, while
bwrap still needed the real targets for mounts. The mismatch caused
shell and apply_patch failures on symlinked writable roots.
## Impact
Fixes#15781.
Also fixes#17079:
- #17079 is the protected symlinked carveout side: bwrap now binds the
real symlinked writable-root target and remaps carveouts before masking.
Related to #15157:
- #15157 is the broader permission-check side of this path-identity
problem. This PR addresses the shared logical-vs-canonical normalization
issue, but the reported Darwin prompt behavior should be validated
separately before auto-closing it.
This should also fix#14672, #14694, #14715, and #15725:
- #14672, #14694, and #14715 are the same Linux
symlinked-writable-root/bwrap family as #15781.
- #15725 is the protected symlinked workspace path variant; the PR
preserves the protected logical path in policy space while bwrap applies
read-only or unreadable treatment to the resolved target so
file-vs-directory bind mismatches do not abort sandbox setup.
## Notes
- Added Linux-only regressions for symlinked writable ancestors and
protected symlinked directory targets, including nested symlink escape
masking without rebinding the escape target writable.
---------
Co-authored-by: Codex <noreply@openai.com>
## Description
This PR introduces `review_id` as the stable identifier for guardian
reviews and exposes it in app-server `item/autoApprovalReview/started`
and `item/autoApprovalReview/completed` events.
Internally, guardian rejection state is now keyed by `review_id` instead
of the reviewed tool item ID. `target_item_id` is still included when a
review maps to a concrete thread item, but it is no longer overloaded as
the review lifecycle identifier.
## Motivation
We'd like to give users the ability to preempt a guardian review while
it's running (approve or decline).
However, we can't implement the API that allows the user to override a
running guardian review because we didn't have a unique `review_id` per
guardian review. Using `target_item_id` is not correct since:
- with execve reviews, there can be multiple execve calls (and therefore
guardian reviews) per shell command
- with network policy reviews, there is no target item ID
The PR that actually implements user overrides will use `review_id` as
the stable identifier.
## Motivation
The `SessionStart` hook already receives `startup` and `resume` sources,
but sessions created from `/clear` previously looked like normal startup
sessions. This makes it impossible for hook authors to distinguish
between these with the matcher.
## Summary
- Add `InitialHistory::Cleared` so `/clear`-created sessions can be
distinguished from ordinary startup sessions.
- Add `SessionStartSource::Clear` and wire it through core, app-server
thread start params, and TUI clear-session flow.
- Update app-server protocol schemas, generated TypeScript, docs, and
related tests.
https://github.com/user-attachments/assets/9cae3cb4-41c7-4d06-b34f-966252442e5c
# Motivation
Make hook display less noisy and more useful by keeping transient hook
activity out of permanent history unless there is useful output,
preserving visibility for meaningful hook work, and making completed
hook severity easier to scan.
Also addresses some of the concerns in
https://github.com/openai/codex/issues/15497
# Changes
## Demo
https://github.com/user-attachments/assets/9d8cebd4-a502-4c95-819c-c806c0731288
Reverse spec for the behavior changes in this branch:
## Hook Lifecycle Rendering
- Hook start events no longer write permanent history rows like `Running
PreToolUse hook`.
- Running hooks now render in a dedicated live hook area above the
composer. It's similar to the active cell we use for tool calls but its
a separate lane.
- Running hook rows use the existing animation setting.
## Hook Reveal Timing
- We wait 300ms before showing running hook rows and linger for up to
600ms once visible.
- This is so fast hooks don't flash a transient `Running hook` row
before user can read it every time.
- If a fast hook completes with meaningful output, only the completed
hook result is written to history.
- If a fast hook completes successfully with no output, it leaves no
visible trace.
## Completed Hook Output
- Completed hooks with output are sticky, for example `• SessionStart
hook (completed)`.
- Hook output entries are rendered under that row with stable prefixes:
`warning:`, `stop:`, `feedback:`, `hook context:`, and `error:`.
- Blocked hooks show feedback entries, for example `• PreToolUse hook
(blocked)` followed by `feedback: ...`.
- Failed hooks show error entries, for example `• PostToolUse hook
(failed)` followed by `error: ...`.
- Stopped hooks show stop entries and remain visually treated as
non-success.
## Parallel Hook Behavior
- Multiple simultaneously running hooks can be tracked in one live hook
cell.
- Adjacent running hooks with the same hook event name and same status
message collapse into a count, for example `• Running 3 PreToolUse
hooks: checking command policy`.
- Running hooks with different event names or different status messages
remain separate rows.
## Hook Run Identity
- `PreToolUse` and `PostToolUse` hook run IDs now include the tool call
ID which prevents concurrent tool-use hooks from sharing a run ID and
clobbering each other in the UI.
- This ID scoping applies to tool-use hooks only; other hook event types
keep their existing run identity behavior.
## App-Server Hook Notifications
- App-server `HookStarted` and `HookCompleted` notifications use the
same live hook rendering path as core hook events.
- `UserPromptSubmit` hook notifications now render through the same
completed hook output format, including warning and stop entries.