Commit Graph

1811 Commits

Author SHA1 Message Date
efrazer-oai
be75785504 fix: fully revert agent identity runtime wiring (#18757)
## Summary

This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes

It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.

This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.

## Stack

1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-21 14:30:55 -07:00
Felipe Coury
e502f0b52d feat(tui): shortcuts to change reasoning level temporarily (#18866)
## Summary

Adds main-chat shortcuts for changing reasoning effort one step at a
time:

- `Alt+,` lowers reasoning (has the `<` arrow on the key)
- `Alt+.` raises reasoning (similarly, has the `>` arrow)

The shortcut updates the active session only. It does not persist the
selected reasoning level as the default for future sessions. In Plan
mode, it applies temporarily to Plan mode without opening the
global-vs-Plan scope prompt.

## Details

The shortcut uses the active model preset to decide which reasoning
levels are valid. If the current session has no explicit reasoning
effort, it starts from the model default. Each keypress moves to the
next supported level in the requested direction.

The shortcut only runs from the main chat surface. If a popup or modal
is open, input remains owned by that UI.

In Plan mode, the shortcut updates the in-memory Plan reasoning override
directly. The model/reasoning picker still keeps the existing scope
prompt for explicit picker changes.

## Notes

Ctrl-plus and Ctrl-minus were considered, but terminals do not deliver
those combinations consistently, so this PR uses Alt shortcuts instead.

If the current effort is unsupported by the selected model, the shortcut
skips to the nearest supported level in the requested direction. If
there is no valid step, it shows the existing boundary message.

## Tests

- `cargo test -p codex-tui reasoning_shortcuts`
- `cargo test -p codex-tui reasoning_effort`
- `cargo test -p codex-tui reasoning_shortcut`
- `cargo test -p codex-tui footer_snapshots`
- `cargo test -p codex-tui`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests`

---------

Co-authored-by: Eric Traut <etraut@openai.com>
2026-04-21 18:04:03 -03:00
Michael Bolin
f8562bd47b sandboxing: intersect permission profiles semantically (#18275)
## Why

Permission approval responses must not be able to grant more access than
the tool requested. Moving this flow to `PermissionProfile` means the
comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
cwd-relative special paths such as `:cwd` and `:project_roots` must stay
anchored to the turn that produced the request.

## What changed

This implements semantic `PermissionProfile` intersection in
`codex-sandboxing` for file-system and network permissions. The
intersection accepts narrower path grants, rejects broader grants,
preserves deny-read carve-outs and glob scan depth, and materializes
cwd-dependent special-path grants to absolute paths before they can be
recorded for reuse.

The request-permissions response paths now use that intersection
consistently. App-server captures the request turn cwd before waiting
for the client response, includes that cwd in the v2 approval params,
and core stores the requested profile plus cwd for direct TUI/client
responses and Guardian decisions before recording turn- or
session-scoped grants. The TUI app-server bridge now preserves the
app-server request cwd when converting permission approval params into
core events.

## Verification

- `cargo test -p codex-sandboxing intersect_permission_profiles --
--nocapture`
- `cargo test -p codex-app-server request_permissions_response --
--nocapture`
- `cargo test -p codex-core
request_permissions_response_materializes_session_cwd_grants_before_recording
-- --nocapture`
- `cargo check -p codex-tui --tests`
- `cargo check --tests`
- `cargo test -p codex-tui
app_server_request_permissions_preserves_file_system_permissions`
2026-04-21 10:23:01 -07:00
Eric Traut
4ed722ab8d Move TUI app tests to modules they cover (#18799)
## Summary

The TUI app refactor in #18753 moved the old `app.rs` tests into a
single `app/tests.rs` file. That kept the split mechanically simple, but
it left several focused unit tests far from the modules they exercise.

This PR is a follow-up that moves tests next to the code they cover.

It also adds `tui/src/app/test_support.rs` for shared fixture
construction.

This is just a mechanical refactoring (no functional changes) and does
not affect any production code.
2026-04-21 10:16:51 -07:00
Eric Traut
b7fec54354 Queue follow-up input during user shell commands (#18820)
Fixes #17954.

## Why
When a manual shell command like `!sleep 10` is running, submitting
plain text such as `hi` currently sends that text as a steer for the
active shell turn. User shell turns are not steerable like model turns,
so the TUI can remain stuck in `Working` after the shell command
finishes.

## What Changed
- Detect when the only active work is one or more
`ExecCommandSource::UserShell` commands.
- Queue plain submitted input in that state so it drains after the shell
command and shell turn complete.
- Preserve `!cmd` submissions during running work so explicit shell
commands keep their existing behavior.
- Add regression coverage for the `!sleep 10` plus `hi` flow in
`chatwidget::tests::exec_flow::user_message_during_user_shell_command_is_queued_not_steered`.

## Verification
- Manually confirmed hang before the fix and no hang after the fix
2026-04-21 10:13:13 -07:00
Casey Chow
41652665f5 [codex] Add tmux-aware OSC 9 notifications (#17836)
## Summary
- wrap OSC 9 notifications in tmux's DCS passthrough so terminal
notifications make it through tmux
- use codex-terminal-detection for OSC 9 auto-selection so tmux sessions
inherit the underlying client terminal support
- add focused notification backend tests for plain OSC 9 and
tmux-wrapped output

## Stack
- base PR: #18479
- review order: #18479, then this PR

## Why
Tmux does not forward OSC 9 notifications directly; the sequence has to
be wrapped in tmux's DCS passthrough envelope. Codex also had local
notification heuristics that could miss supported terminals when running
under tmux, even though codex-terminal-detection already knows how to
attribute tmux sessions to the client terminal.

## Validation
- `just fmt`
- `cargo test -p codex-tui` *(currently blocked by an unrelated existing
compile error in `app-server/src/message_processor.rs:754` referencing
`connection_id` out of scope; not caused by this change)*

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 17:10:36 +00:00
Felipe Coury
1101dec9ae fix(tui): disable enhanced keys for VS Code WSL (#18741)
Fixes https://github.com/openai/codex/issues/13638

## Why

VS Code's integrated terminal can run a Linux shell through WSL without
exposing `TERM_PROGRAM` to the Linux process, and with crossterm
keyboard enhancement flags enabled that environment can turn dead-key
composition into malformed key events instead of composed Unicode input.
Codex already handles composed Unicode correctly, so the fix is to avoid
enabling the terminal mode that breaks this path for the affected
terminal combination.

## What Changed

- Automatically skip crossterm keyboard enhancement flags when Codex
detects WSL plus VS Code, including a Windows-side `TERM_PROGRAM` probe
through WSL interop.
- Add `CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT` so users can
force-disable or force-enable the keyboard enhancement policy for
diagnosis.

## Verification

- Added unit coverage for env parsing, VS Code detection, and the WSL/VS
Code auto-disable policy.
- `cargo check -p codex-tui` passed.
- `./tools/argument-comment-lint/run.py -p codex-tui -- --tests` passed.
- `cargo test -p codex-tui` was attempted locally, but the checkout
failed during linking before tests executed because V8 symbols from
`codex-code-mode` were unresolved for `arm64`.
2026-04-21 09:57:51 -03:00
Abhinav
ef071cf816 show bash mode in the TUI (#18271)
## What

- Explicitly show our "bash mode" by changing the color and adding a
callout similar to how we do for `Plan mode (shift + tab to cycle)`
- Also replace our `›` composer prefix with a bang `!`


![](https://github.com/user-attachments/assets/f5549c75-3a03-433d-aa57-e4c6d0682c49)

## Why

- It was unclear that we had a Bash mode
- This feels more responsive
- It looks cool!

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 00:15:49 -07:00
pash-openai
dc1a8f2190 [tool search] support namespaced deferred dynamic tools (#18413)
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.

This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.

It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.

Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`

---------

Co-authored-by: Sayan Sisodiya <sayan@openai.com>
2026-04-21 14:13:08 +08:00
Abhinav
ab26554a3a Add remote_sandbox_config to our config requirements (#18763)
## Why

Customers need finer-grained control over allowed sandbox modes based on
the host Codex is running on. For example, they may want stricter
sandbox limits on devboxes while keeping a different default elsewhere.

Our current cloud requirements can target user/account groups, but they
cannot vary sandbox requirements by host. That makes remote development
environments awkward because the same top-level `allowed_sandbox_modes`
has to apply everywhere.

## What

Adds a new `remote_sandbox_config` section to `requirements.toml`:

```toml
allowed_sandbox_modes = ["read-only"]

[[remote_sandbox_config]]
hostname_patterns = ["*.org"]
allowed_sandbox_modes = ["read-only", "workspace-write"]

[[remote_sandbox_config]]
hostname_patterns = ["*.sh", "runner-*.ci"]
allowed_sandbox_modes = ["read-only", "danger-full-access"]
```

During requirements resolution, Codex resolves the local host name once,
preferring the machine FQDN when available and falling back to the
cleaned kernel hostname. This host classification is best effort rather
than authenticated device proof.

Each requirements source applies its first matching
`remote_sandbox_config` entry before it is merged with other sources.
The shared merge helper keeps that `apply_remote_sandbox_config` step
paired with requirements merging so new requirements sources do not have
to remember the extra call.

That preserves source precedence: a lower-precedence requirements file
with a matching `remote_sandbox_config` cannot override a
higher-precedence source that already set `allowed_sandbox_modes`.

This also wires the hostname-aware resolution through app-server,
CLI/TUI config loading, config API reads, and config layer metadata so
they all evaluate remote sandbox requirements consistently.

## Verification

- `cargo test -p codex-config remote_sandbox_config`
- `cargo test -p codex-config host_name`
- `cargo test -p codex-core
load_config_layers_applies_matching_remote_sandbox_config`
- `cargo test -p codex-core
system_remote_sandbox_config_keeps_cloud_sandbox_modes`
- `cargo test -p codex-config`
- `cargo test -p codex-core` unit tests passed; `tests/all.rs`
integration matrix was intentionally stopped after the relevant focused
tests passed
- `just fix -p codex-config`
- `just fix -p codex-core`
- `cargo check -p codex-app-server`
2026-04-21 05:05:02 +00:00
Dylan Hurd
86535c9901 feat(auto-review) Handle request_permissions calls (#18393)
## Summary
When auto-review is enabled, it should handle request_permissions tool.
We'll need to clean up the UX but I'm planning to do that in a separate
pass

## Testing
- [x] Ran locally
<img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM"
src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8"
/>

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-20 21:48:57 -07:00
canvrno-oai
2cc146f5ea Fallback display names for TUI skill mentions (#18786)
This updates TUI skill mentions to show a fallback label when a skill
does not define a display name, so unnamed skills remain understandable
in the picker without changing behavior for skills that already have
one.

<img width="1028" height="198" alt="Screenshot 2026-04-20 at 6 25 15 PM"
src="https://github.com/user-attachments/assets/84077b85-99d0-4db9-b533-37e1887b4506"
/>
2026-04-20 20:46:55 -07:00
Michael Bolin
3d2f123895 protocol: preserve glob scan depth in permission profiles (#18713)
## Why

#18274 made `PermissionProfile` the canonical file-system permissions
shape, but the round-trip from `FileSystemSandboxPolicy` to
`PermissionProfile` still dropped one piece of policy metadata:
`glob_scan_max_depth`.

That field is security-relevant for deny-read globs such as `**/*.env`.
On Linux, bubblewrap sandbox construction uses it to bound unreadable
glob expansion. If a profile copied from active runtime permissions
loses this value and is submitted back as an override, the resulting
`FileSystemSandboxPolicy` can behave differently even though the visible
permission entries look equivalent.

## What changed

- Add `glob_scan_max_depth` to protocol `FileSystemPermissions` and
preserve it when converting to/from `FileSystemSandboxPolicy`.
- Keep legacy `read`/`write` JSON for simple path-only permissions, but
force canonical JSON when glob scan depth is present so the metadata is
not silently dropped.
- Carry `globScanMaxDepth` through app-server
`AdditionalFileSystemPermissions`, generated JSON/TypeScript schemas,
and app-server/TUI conversion call sites.
- Preserve the metadata through sandboxing permission normalization,
merging, and intersection.
- Carry the merged scan depth into the effective
`FileSystemSandboxPolicy` used for command execution, so bounded
deny-read globs reach Linux bubblewrap materialization.

## Verification

- `cargo test -p codex-sandboxing glob_scan -- --nocapture`
- `cargo test -p codex-sandboxing policy_transforms -- --nocapture`
- `just fix -p codex-sandboxing`





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18713).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* #18279
* #18278
* #18277
* #18276
* #18275
* __->__ #18713
2026-04-20 19:42:45 -07:00
canvrno-oai
9a2b34213b /statusline & /title - Shared preview values (#18435)
This PR makes the `/statusline` and `/title` setup UIs share one
preview-value source instead of each surface using its own examples.
Both pickers now render consistent live values when available, and
stable placeholders when they are not. It also resolves live preview
values at the shared preview-item layer, so `/title` preview can use
real runtime values for title-specific cases like status text, task
progress, and project-name fallback behavior.

- Adds a shared preview data model for status surfaces
- Maps status-line items and terminal-title items onto that shared
preview list
- Feeds both setup views from the same chatwidget-derived preview data,
with terminal-title-specific formatting applied before `/title` preview
renders
- Keeps project-root preview aligned with status-line behavior while
project in /title keeps its title fallback/truncation behavior
- Adds snapshot coverage for live-only, hardcoded-only, and mixed cases

Test Steps
- Open Codex TUI and launch `/statusline`.
- Toggle and reorder items, then verify the preview uses current session
values when possible, and placeholder values for missing values (ex: no
thread ID).
- Open `/title` and verify it shows the same normalized values,
including live status/task-progress values when available.
2026-04-20 17:46:11 -07:00
Eric Traut
216e7a0a56 Warn when trusting Git subdirectories (#18602)
Addresses #18505

## Summary

When Codex is launched from a subdirectory of a Git repository, the
onboarding trust prompt says it is trusting the current directory even
though the persisted trust target is the repository root. That can make
the scope of the trust decision unclear.

This updates the TUI trust prompt to show a yellow note only when the
current directory differs from the resolved trust target, explaining
that trust applies to the repository root and displaying that root. It
also removes the stale onboarding TODO now that the warning is
implemented.
2026-04-20 16:43:21 -07:00
Eric Traut
0f1c9b8963 Fix exec inheritance of root shared flags (#18630)
Addresses #18113

Problem: Shared flags provided before the exec subcommand were parsed by
the root CLI but not inherited by the exec CLI, so exec sessions could
run with stale or default sandbox and model configuration.

Solution: Move shared TUI and exec flags into a common option block and
merge root selections into exec before dispatch, while preserving exec's
global subcommand flag behavior.
2026-04-20 16:12:17 -07:00
Eric Traut
2af4f15479 Refactor TUI app module into submodules (#18753)
## Why

The TUI app module had grown past the 512K source-file cap enforced by
CI/CD. This keeps the app entry point below that limit while preserving
the existing runtime behavior and test surface.

## What changed

- Kept the top-level `App` state and run-loop wiring in
`tui/src/app.rs`.
- Split app responsibilities into focused private submodules under
`tui/src/app/`, covering event dispatch, thread routing, session
lifecycle, config persistence, background requests, startup prompts,
input, history UI, platform actions, and thread event buffering.
- Moved the existing app-level tests into `tui/src/app/tests.rs` and
reused the existing snapshot location rather than adding new tests or
snapshots.
- Added module header comments for `app.rs` and the new submodules.

## Follow-up

A future cleanup can move narrow unit tests from `tui/src/app/tests.rs`
into the specific app submodules they exercise. This PR keeps the
existing app-level tests together so the refactor stays focused on the
source-file split.

## Verification

- `cargo test -p codex-tui --lib
app::tests::agent_picker_item_name_snapshot`
- `cargo test -p codex-tui --lib app::tests::clear_ui`
- `cargo test -p codex-tui --lib
app::tests::ctrl_l_clear_ui_after_long_transcript_reuses_clear_header_snapshot`
- `just fix -p codex-tui`

Full `cargo test -p codex-tui` still fails on model-catalog drift
unrelated to this refactor, including stale
`gpt-5.3-codex`/`gpt-5.1-codex` snapshot and migration expectations now
resolving to `gpt-5.4`.
2026-04-20 16:10:35 -07:00
guinness-oai
1029742cf7 Add realtime silence tool (#18635)
## Summary

Adds a second realtime v2 function tool, `remain_silent`, so the
realtime model has an explicit non-speaking action when the
collaboration mode or latest context says it should not answer aloud.
This is stacked on #18597.

## Design

- Advertise `remain_silent` alongside `background_agent` in realtime v2
conversational sessions.
- Parse `remain_silent` function calls into a typed
`RealtimeEvent::NoopRequested` event.
- Have core answer that function call with an empty
`function_call_output` and deliberately avoid `response.create`, so no
follow-up realtime response is requested.
- Keep the event hidden from app-server/TUI surfaces; it is operational
plumbing, not user-visible conversation content.
2026-04-20 15:43:20 -07:00
Eric Traut
b8e78e8869 Use app server metadata for fork parent titles (#18632)
## Problem
The TUI resolved fork parent titles from local CODEX_HOME metadata,
which could show missing or stale titles when app-server metadata is
authoritative.

This is a lingering bug left over from the migration of the TUI to the
app-server interface. I found it when I asked Codex to review all places
where the TUI code was still directly accessing the local CODEX_HOME.

## Solution
Route fork parent title metadata through the app-server session state
and render only that supplied title, with focused snapshot coverage for
stale local metadata.

## Testing
I manually tested by renaming a thread then forking it and confirming
that the "forked from" message indicated the parent thread's name.
2026-04-20 15:37:31 -07:00
Felipe Coury
cebe57b723 fix(tui): keep /copy aligned with rollback (#18739)
## Why

Fixes #18718.

After rewinding a thread, `/copy` could still copy the latest assistant
response from before the rewind. The transcript cells were rolled back,
but the copy source was a single `last_agent_markdown` cache that was
not synchronized with backtracking, so the visible conversation and
copied content could diverge.

## What changed

`ChatWidget` now keeps a bounded copy history for the most recent 32
assistant responses, keyed by the visible user-turn count. When local
rollback trims transcript cells, the copy cache is trimmed to the same
surviving user-turn count so `/copy` uses the latest visible assistant
response.

If the user rewinds past the retained copy window, `/copy` now reports:

```text
Cannot copy that response after rewinding. Only the most recent 32 responses are available to /copy.
```

The change also adds coverage for copying the latest surviving response
after rollback and for the over-limit rewind message.

## Verification

- Manually resumed a synthetic 35-turn session, rewound within the
retained window, and verified `/copy` copied the surviving response.
- Manually rewound past the retained window and verified `/copy` showed
the 32-response limit message.
- `cargo test -p codex-tui slash_copy`
- `just fix -p codex-tui`
- `cargo insta pending-snapshots`

Note: `cargo test -p codex-tui` currently fails on unrelated model
catalog and snapshot drift around the default model changing to
`gpt-5.4`; the focused `/copy` tests pass after fixing the new test
setup.
2026-04-20 19:24:10 -03:00
Ahmed Ibrahim
cc96a03f10 Fix stale model test fixtures (#18719)
Fixes stale test fixtures left after the active bundled model catalog
updates in #18586 and #18388. Those changes made `gpt-5.4` the current
default and removed several older hardcoded slugs, which left Windows
Bazel shards failing TUI and config tests.

What changed:
- Refresh TUI model migration, availability NUX, plan-mode, status, and
snapshot fixtures to use active bundled model slugs.
- Update the config edit test expectation for the TOML-quoted
`"gpt-5.2"` migration key.
- Move the model catalog tests into
`codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not
trip the blob-size policy for `app.rs`.

Verification:
- CI Bazel/lint checks are expected to cover the affected test shards.
2026-04-20 21:52:30 +00:00
Eric Traut
baa5dd7b29 Surface TUI skills refresh failures (#18627)
## Why

`skills/list` refreshes are best-effort metadata updates. If one fails
during startup or thread switching, the TUI should keep running and show
enough detail to diagnose the app-server failure instead of leaving the
user with only a log entry.

This addresses the recoverability and observability issue reported in
#16914.

## What Changed

- Preserve the full startup `skills/list` error chain before sending it
back through the app event queue.
- Surface failed skills refreshes as recoverable TUI error messages
while still logging the warning.

This is related to the recent bug fix from [PR
#18370](https://github.com/openai/codex/pull/18370).
2026-04-20 14:43:04 -07:00
Eric Traut
164b6a0c78 Remove simple TUI legacy_core reexports (#18631)
## Problem
The TUI still imported path utilities and config-loader symbols through
app-server-client's legacy_core facade even though those APIs already
exist in utility/config crates. This is part of our ongoing effort to
whittle away at these old dependencies.

## Solution
Rewire imports to avoid the TUI directly importing from the core crate
and instead import from common lower-level crates. This PR doesn't
include any functional changes; it's just a simple rewiring.
2026-04-20 10:48:27 -07:00
Akshay Nathan
34a3e85fcd Wire the PatchUpdated events through app_server (#18289)
Wires patch_updated events through app_server. These events are parsed
and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.

The eventual goal is to use this to display better progress indicators in
the codex app.
2026-04-20 10:44:03 -07:00
Ahmed Ibrahim
316cf0e90b Update models.json (#18586)
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
2026-04-20 10:27:01 -07:00
Michael Bolin
dcec516313 protocol: canonicalize file system permissions (#18274)
## Why

`PermissionProfile` needs stable, canonical file-system semantics before
it can become the primary runtime permissions abstraction. Without a
canonical form, callers have to keep re-deriving legacy sandbox maps and
profile comparisons remain lossy or order-dependent.

## What changed

This adds canonicalization helpers for `FileSystemPermissions` and
`PermissionProfile`, expands special paths into explicit sandbox
entries, and updates permission request/conversion paths to consume
those canonical entries. It also tightens the legacy bridge so root-wide
write profiles with narrower carveouts are not silently projected as
full-disk legacy access.

## Verification

- `cargo test -p codex-protocol
root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
- `cargo test -p codex-sandboxing permission -- --nocapture`
- `cargo test -p codex-tui permissions -- --nocapture`
2026-04-20 09:57:03 -07:00
Eric Traut
0dc503ba6e Surface parent thread status in side conversations (#18591)
## Summary

Side conversations can hide important state changes from the parent
conversation while the user is focused on the side thread. In
particular, the parent may finish, fail, need user input, or require an
approval while the side conversation remains visible. Users need a
lightweight signal for those states, but parent approval overlays should
not interrupt the side conversation itself.

This change adds parent-conversation status to the side conversation
context label and defers parent interactive overlays while side mode is
active. When the user exits side mode, pending parent approvals and
input requests are restored in the main thread. The pending approval
footer avoids duplicating the same parent approval status, and replayed
notice cells are filtered when restoring a pending interactive request
so tips or warnings do not crowd out the approval prompt.

The change is contained to the TUI side-conversation and thread replay
paths.

Example 1: Approval pending
<img width="752" height="35" alt="Screenshot 2026-04-19 at 12 56 07 PM"
src="https://github.com/user-attachments/assets/1cc0f1a3-9cab-4d60-aed2-96523ccafc20"
/>

Example 2: Turn complete
<img width="754" height="35" alt="Screenshot 2026-04-19 at 12 56 27 PM"
src="https://github.com/user-attachments/assets/653521a5-e298-4366-ae1c-72b56eb88eeb"
/>
2026-04-20 09:00:44 -07:00
Eric Traut
43a69c50eb Use app server thread names in TUI picker (#18633)
## Problem

The TUI resume/fork picker was backfilling thread names from local
rollout indexes. This was left over from before the TUI was moved to the
app server. It should be using app-server APIs because the TUI might be
connected to a remote connection.

This bug wasn't (yet) reported by a user. I found it by asking Codex to
review places in the TUI code where it was still directly accessing the
CODEX_HOME directory rather than going through app-server APIs.

## Solution

The resume picker and session lookups should use app-server thread APIs
only. Remove legacy rollout name/list backfills, and avoid local name
reads in fork history.

## Testing

I manually tested `codex resume` and `codex resume --all` to look for
functional or performance regressions in the resume picker.
2026-04-20 08:16:24 -07:00
Eric Traut
5a8700abcc Add verbose diagnostics for /mcp (#18610)
Fixes #18539.

## Summary
The recent `/mcp` performance work kept the default command fast by
avoiding resource and resource-template inventory probes, but it also
removed useful diagnostics for users trying to confirm MCP server state.

This keeps bare `/mcp` on the fast tools/auth path and adds `/mcp
verbose` for the slower diagnostic view. Verbose mode requests full MCP
server status from the app-server and restores status, resources, and
resource templates in the TUI output.

## Testing
In addition to running automation, I manually tested the feature to
confirm that it works.
2026-04-20 08:13:44 -07:00
jif-oai
7e5588699d chore: drop review prompt from TUI UX (#18659)
Due to the app-server rebase of the TUI, the review prompt was leaked
into the transcript on the TUI
This is not a security issue but it was bad UX. This PR fixes this
2026-04-20 14:31:37 +01:00
richardopenai
3c75f9b4dd [codex] Add workspace owner usage nudge UI (#18221)
## Summary

Third PR in the split from #17956. Stacked on #18220.

- shows workspace-owner/member-specific rate-limit messages behind
`workspace_owner_usage_nudge`
- prompts workspace members to notify the owner or request a usage-limit
increase
- sends the confirmed nudge through the app-server API and renders
completion feedback
- adds focused TUI snapshot coverage for prompts and completion states
- feature gate

## Validation

- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
2026-04-20 05:51:47 +00:00
Eric Traut
87fc21ff60 TUI: remove simple legacy_core re-exports (#18605)
## Summary

The TUI still imported several symbols through the transitional
app-server-client `legacy_core` facade even though those symbols are
already owned by smaller crates. This PR narrows that facade by rewiring
those imports directly to their owner crates.

## Changes

No functional changes, just import rewiring. This is part of our ongoing
effort to whittle away at the `legacy_core` namespace, which represents
all of the remaining symbols that the TUI imports from the core.
2026-04-19 22:39:53 -07:00
Eric Traut
fa8943fe7e Use thread IDs in TUI resume hints (#18440)
## Summary

Fixes #18313.

Recent TUI resume breadcrumbs could print a thread title instead of the
stable thread UUID. For sessions whose title was auto-derived from the
first prompt, that made the suggested codex resume command look like it
should resume a long prompt rather than the session ID.

This updates the TUI and CLI post-exit resume hints, plus the in-session
summary shown when switching/forking threads, to always use the stable
thread ID for these recovery breadcrumbs. Explicit name-based resume
support remains available elsewhere.
2026-04-19 22:38:48 -07:00
pash-openai
d58d3ccfec Soften Fast mode plan usage copy (#18601)
Fast mode TUI copy currently names a specific plan-usage multiplier in
two lightweight promo/help surfaces. This swaps that exact multiplier
language for the broader increased plan usage wording we use elsewhere.

There are no behavior changes here; the slash command and startup tip
still point users at the same Fast mode flow.
2026-04-20 00:37:40 +00:00
Eric Traut
ce0e28ea6f Avoid redundant memory enable notice (#18580)
## Summary

Fixes #18554.

The `/experimental` menu can submit the full experimental feature state
even when the user presses Enter without toggling anything. Previously,
Codex showed `Memories will be enabled in the next session.` whenever
the submitted updates included `Feature::MemoryTool = true`, so sessions
where Memories were already enabled could show a redundant warning on a
no-op save.

This change records whether `Feature::MemoryTool` was enabled before
applying feature updates and only emits the next-session notice when
Memories actually transitions from disabled to enabled.
2026-04-19 13:48:15 -07:00
Eric Traut
95dafbc7b5 Add /side conversations (#18190)
The TUI supports long-running turns and agent threads, but quick side
questions have required interrupting the main flow or manually
forking/navigating threads. This PR adds a guarded `/side` flow so users
can ask brief side-conversation questions in an ephemeral fork while
keeping the primary thread focused. This also helps address the feature
request in #18125.

The implementation creates one side conversation at a time, lets `/side`
open either an empty side thread or immediately submit `/side
<question>`, and returns to the parent with Esc or Ctrl+C. Side
conversations get hidden developer guardrails that treat inherited
history as reference-only and steer the model away from workspace
mutations unless explicitly requested in the side conversation.

The TUI hides most slash commands while side mode is active, leaving
only `/copy`, `/diff`, `/mention`, and `/status` available there.
2026-04-19 11:59:41 -07:00
Eric Traut
917a85b0d6 Queue slash and shell prompts in the TUI (#18542)
## Why

Users have asked to queue follow-up slash commands while a task is
running, including in #14081, #14588, #14286, and #13779. The previous
TUI behavior validated slash commands immediately, so commands that are
only meaningful once the current turn is idle could not be queued
consistently.

The queue should preserve what the user typed and defer command parsing
until the item is actually dispatched. This also gives `/fast`, `/review
...`, `/rename ...`, `/model`, `/permissions`, and similar slash
workflows the same FIFO behavior as plain queued prompts.

## What Changed

- Added a queued-input action enum so queued items can be dispatched as
plain prompts, slash commands, or user shell commands.
- Changed `Tab` queueing to accept slash-led prompts without validating
them up front, then parse and dispatch them when dequeued.
- Added `!` shell-command queueing for `Tab` while a task is running,
while preserving existing `Enter` behavior for immediate shell
execution.
- Moved queued slash dispatch through shared slash-command parsing so
inline commands, unavailable commands, unknown commands, and local
config commands report at dequeue time.
- Continued queue draining after local-only actions and after slash menu
cancellation or selection when no task is running.
- Preserved slash-popup completion behavior so `/mo<Tab>` completes to
`/model ` instead of queueing the prefix.
- Updated pending-input preview snapshots to show queued follow-up
inputs.

## Verification

I did a bunch of manual validation (and found and fixed a few bugs along
the way).
2026-04-19 10:52:16 -07:00
Felipe Coury
241136b0e9 feat(tui): show context used in plan implementation prompt (#18573)
# Summary

When a user finishes planning, the TUI asks whether to implement in the
current conversation or start fresh with the approved plan. The
clear-context choice is easier to evaluate when the prompt shows how
much context has already been used, because the user can see when
carrying the full prior conversation is likely to be less useful than
preserving only the plan.

<img width="1612" height="1312" alt="image"
src="https://github.com/user-attachments/assets/694bcf87-8be5-4e88-a412-e562af62d5f7"
/>
    
This PR adds that context signal directly to the clear-context option
while keeping the copy compact enough for the Plan-mode selection popup.

# What Changed

- Compute an optional context-usage label when opening the plan
implementation prompt.
- Show the label only on `Yes, clear context and implement`, where it
informs the cleanup decision.
- Prefer a percentage-used label when context-window information is
available, with a compact token-used fallback when only token totals are
known.
- Preserve the original option description when usage is unknown or
effectively zero.
- Add rustdoc comments around the prompt-copy boundary so future changes
keep the context label formatting and selection rendering
responsibilities clear.

# Testing

- `cargo test -p codex-tui plan_implementation`

# Notes

The footer continues to show context remaining as ambient status. The
implementation prompt intentionally shows context used because the user
is choosing whether to clean up the current thread before
implementation.
2026-04-19 14:01:58 -03:00
xl-openai
3f7222ec76 feat: Budget skill metadata and surface trimming as a warning (#18298)
Cap the model-visible skills section to a small share of the context
window, with a fallback character budget, and keep only as many implicit
skills as fit within that budget.

Emit a non-fatal warning when enabled skills are omitted, and add a new
app-server warning notification

Record thread-start skill metrics for total enabled skills, kept skills,
and whether truncation happened

---------

Co-authored-by: Matthew Zeng <mzeng@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-17 18:11:47 -07:00
alexsong-oai
93ff798e5b [TUI] add external config migration prompt when start TUI (#17891)
- add a TUI startup migration prompt for external agent config
- support migrating external configs including config, skills, AGENTS.md
and plugins
- gate the prompt behind features.external_migrate (default false)

<img width="1037" height="480" alt="Screenshot 2026-04-14 at 9 29 14 PM"
src="https://github.com/user-attachments/assets/6060849b-03cb-429a-9c13-c7bb46ad2e65"
/>
<img width="713" height="183" alt="Screenshot 2026-04-14 at 9 29 26 PM"
src="https://github.com/user-attachments/assets/d13f177e-d4c4-479c-8736-ef29636081e1"
/>

---------

Co-authored-by: Eric Traut <etraut@openai.com>
2026-04-17 17:58:32 -07:00
viyatb-oai
370bed4bf4 fix: trust-gate project hooks and exec policies (#14718)
## Summary
- trust-gate project `.codex` layers consistently, including repos that
have `.codex/hooks.json` or `.codex/execpolicy/*.rules` but no
`.codex/config.toml`
- keep disabled project layers in the config stack so nested trusted
project layers still resolve correctly, while preventing hooks and exec
policies from loading until the project is trusted
- update app-server/TUI onboarding copy to make the trust boundary
explicit and add regressions for loader, hooks, exec-policy, and
onboarding coverage

## Security
Before this change, an untrusted repo could auto-load project hooks or
exec policies from `.codex/` as long as `config.toml` was absent. This
makes trust the single gate for project-local config, hooks, and exec
policies.

## Stack
- Parent of #15936

## Test
- cargo test -p codex-core without_config_toml

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-17 17:56:58 -07:00
canvrno-oai
06f8ec54db /plugins: Add inline enablement toggles (#18395)
This PR adds inline enable/disable controls to the new /plugins browse
menu. Installed plugins can now be toggled directly from the list with
keyboard interaction, and the associated config-write plumbing is
included so the UI and persisted plugin state stay in sync. This also
includes the queued-write handling needed to avoid stale toggle
completions overwriting newer intent.

- Add toggleable plugin rows for installed plugins in /plugins
- Support Space to enable or disable without leaving the list
- Persist plugin enablement through the existing app/config write path
- Preserve the current selection while the list refreshes after a toggle
- Add tests and snapshot updates for toggling behavior

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-17 17:33:11 -07:00
xl-openai
26d9894a27 feat: Add remote plugin fields to plugin API (#17277)
## Summary
Update the plugin API for the new remote plugin model.

The mental model is no longer “keep local plugin state in sync with
remote.” Instead, local and remote plugins are becoming separate
sources. Remote catalog entries can be shown directly from the remote
API before installation; after installation they are still downloaded
into the local cache for execution, but remote installed state will come
from the API and be held in memory rather than being read from config.

• ## API changes
- Remove `forceRemoteSync` from `plugin/list`, `plugin/install`, and
`plugin/uninstall`.
  - Remove `remoteSyncError` from `plugin/list`.
  - Add remote-capable metadata to `plugin/list` / `plugin/read`:
    - nullable `marketplaces[].path`
    - `source: { type: "remote", downloadUrl }`
    - URL asset fields alongside local path fields:
  `composerIconUrl`, `logoUrl`, `screenshotUrls`
  - Make `plugin/read` and `plugin/install` source-compatible:
    - `marketplacePath?: AbsolutePathBuf | null`
    - `remoteMarketplaceName?: string | null`
    - exactly one source is required at runtime
2026-04-17 16:47:58 -07:00
Michael Bolin
96d35dd640 bazel: use native rust test sharding (#18082)
## Why

The large Rust test suites are slow and include some of our flakiest
tests, so we want to run them with Bazel native sharding while keeping
shard membership stable between runs.

This is the simpler follow-up to the explicit-label experiment in
#17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which
includes the stable test-name hashing support from
hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel
macros into that support.

Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's
per-shard test action caching. Using stable name hashing avoids
reshuffling every test when one test is added or removed.

## What Changed

`codex_rust_crate` now accepts `test_shard_counts` and applies the right
Bazel/rules_rust attributes to generated unit and integration test
rules. Matched tests are also marked `flaky = True`, giving them Bazel's
default three attempts.

This PR shards these labels 8 ways:

```text
//codex-rs/core:core-all-test
//codex-rs/core:core-unit-tests
//codex-rs/app-server:app-server-all-test
//codex-rs/app-server:app-server-unit-tests
//codex-rs/tui:tui-unit-tests
```

## Verification

`bazel query --output=build` over the selected public labels and their
inner unit-test binaries confirmed the expected `shard_count = 8`,
`flaky = True`, and `experimental_enable_sharding = True` attributes.

Also verified that we see the shards as expected in BuildBuddy so they
can be analyzed independently.

Co-authored-by: Codex <noreply@openai.com>
2026-04-17 23:14:11 +00:00
richardopenai
139fa8b8f2 [codex] Propagate rate limit reached type (#18227)
## Summary

First PR in the split from #17956.

- adds the core/app-server `RateLimitReachedType` shape
- maps backend `rate_limit_reached_type` into Codex rate-limit snapshots
- carries the field through app-server notifications/responses and
generated schemas
- updates existing constructors/tests for the new optional field

## Validation

- `cargo test -p codex-backend-client`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server rate_limits`
- `cargo test -p codex-tui workspace_`
- `cargo test -p codex-tui status_`
- `just fmt`
- `just fix -p codex-backend-client`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
2026-04-17 13:37:25 -07:00
canvrno-oai
f017a23835 /plugins: Add v2 tabbed marketplace menu (#18222)
This PR moves `/plugins` onto the shared tabbed selection-list
infrastructure and introduces the new v2 menu. The menu now groups
plugins into All Plugins, Installed, OpenAI Curated, and per-marketplace
tabs.

- Rebuild /plugins on top of the shared tabbed selection list
- Add All Plugins, Installed, OpenAI Curated, and per-marketplace tabs
- Preserve active tab and selected-row behavior across popup refreshes
- Add duplicate marketplace tab-label disambiguation
- Update browse-mode popup tests and snapshots

Co-authored-by: Codex <noreply@openai.com>
2026-04-17 12:59:18 -07:00
Felipe Coury
48f117d0a2 perf(tui): defer startup skills refresh (#18370)
# Summary

This removes startup `skills/list` from the critical path to first
input. In release measurements, median startup-to-input time improved
from `307.5 ms` to `191.0 ms` across 30 measured runs with 5 warmups.

# Background

Startup currently waits for a forced `skills/list` app-server request
before scheduling the first usable TUI frame. That makes skill metadata
freshness part of the process-launch-to-input path, even though the
prompt can safely accept normal input before skill metadata has finished
loading.

I measured startup from process launch until the TUI reports that the
user can type. The measurement harness watched the startup measurement
record, killed Codex after a successful sample, and enforced a timeout
so repeated runs would not leave TUI processes behind. The debug runs
had enough outliers that I used median as the primary signal and ran a
baseline self-compare to understand the noise floor.

# Why skills/list

The `skills/list` cut was the best practical optimization because it
improved startup without changing the important readiness contract: when
the prompt is shown, it is still backed by an active session. Only
enrichment data arrives later.

| Candidate | Result | Decision |
| --- | --- | --- |
| Defer startup `skills/list` | Debug median improved from `524.0 ms` to
`348.0 ms`; release median improved from `307.5 ms` to `191.0 ms`. |
Keep |
| Defer fresh `thread/start` | Debug median improved from `494.0 ms` to
`256.0 ms`, but the prompt could appear before an active thread was
attached. | Reject as too risky for this PR |
| Avoid forced skills config reload | Debug median moved from `509.0 ms`
to `512.0 ms`. | Reject as neutral |
| Skip fresh history metadata | Debug median moved from `496.5 ms` to
`531.5 ms`. | Reject as regression/noise |
| Defer app-server startup | Not implemented because it would only
permit a loading frame unless the TUI gained a deliberate pre-server
state. | Out of scope |

# Implementation

`App::refresh_startup_skills` now clones the app-server request handle,
spawns a background task, and issues the same forced `skills/list`
request after the first frame is scheduled. When the request completes,
the task sends `AppEvent::SkillsListLoaded` back through the normal app
event queue.

The existing skills response handling still converts the app-server
response, updates the chat widget, and emits invalid `SKILL.md`
warnings. Explicit user-initiated skills refreshes still use the
existing synchronous app command path, so callers that intentionally
requested fresh skill state do not race ahead of their own refresh.

# Tradeoffs

The main tradeoff is a narrow theoretical race at startup: skill mention
completion depends on a background `skills/list` response, so it could
briefly show stale or empty metadata if opened before that response
arrives. In manual testing, pressing `$` as soon as possible after
launch still showed populated skill metadata, so this risk appears
minimal in normal use. Plain input remains available immediately, and
the UI updates through the existing skills response path once the
refresh completes.

This PR does not change how skills are discovered, cached,
force-reloaded, displayed, enabled, or warned about. It only changes
when the startup refresh is allowed to complete relative to the first
usable TUI frame.

# Verification

- `cargo test -p codex-tui`
2026-04-17 16:55:00 -03:00
Ahmed Ibrahim
0f0ef094b6 Show default reasoning in /status (#18373)
- Shows the model catalog default reasoning effort when no reasoning
override is configured.
- Adds /status coverage for the empty-config fallback.
2026-04-17 12:21:09 -07:00
David de Regt
eaf78e43f2 Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305)
To improve performance of UI loads from the app, add two main
improvements:
1. The `thread/list` api now gets a `sortDirection` request field and a
`backwardsCursor` to the response, which lets you paginate forwards and
backwards from a window. This lets you fetch the first few items to
display immediately while you paginate to fill in history, then can
paginate "backwards" on future loads to catch up with any changes since
the last UI load without a full reload of the entire data set.
2. Added a new `thread/turns/list` api which also has sortDirection and
backwardsCursor for the same behavior as `thread/list`, allowing you the
same small-fetch for immediate display followed by background fill-in
and resync catchup.
2026-04-17 11:49:02 -07:00
Felipe Coury
d3692b14c9 feat(tui): add clear-context plan implementation (#17499)
## TL;DR

- Adds a second Plan Mode handoff: implement the approved plan after
clearing context.
- Keeps the existing same-thread `Yes, implement this plan` action
unchanged.
- Reuses the `/clear` thread-start path and submits the approved plan as
the fresh thread's first prompt.
- Covers the new popup option, event plumbing, initial-message behavior,
and disabled states in TUI tests.

## Problem

Plan Mode already asks whether to implement an approved plan, but the
only affirmative path continues in the same thread. That is useful when
the planning conversation itself is still valuable, but it does not
support the workflow where exploratory planning context is discarded and
implementation starts from the final approved plan as the only
model-visible handoff.

<img width="1253" height="869" alt="image"
src="https://github.com/user-attachments/assets/90023d75-c330-4919-bed8-518671c3474b"
/>

## Mental model

There are now two implementation choices after a proposed plan. The
existing choice, `Yes, implement this plan`, is unchanged: it switches
to Default mode and submits `Implement the plan.` in the current thread.
The new choice, `Yes, clear context and implement`, treats the proposed
plan as a handoff artifact. It clears the UI/session context through the
same thread-start source used by `/clear`, then submits an initial
prompt containing the approved plan after the fresh thread is
configured.

The important distinction is that the new path is not compaction. The
model receives a deliberate implementation prompt built from the
approved plan markdown, not a summary of the previous planning
transcript. Both implementation choices require the Default
collaboration preset to be available, so the popup does not offer a
coding handoff when the fresh thread would fall back to another mode.

## Non-goals

This change does not alter `/clear`, `/compact`, or the existing
same-context Plan Mode implementation option. It does not add protocol
surface area or app-server schema changes. It also does not carry the
previous transcript path or a generated planning summary into the new
model context.

## Tradeoffs

The fresh-context option relies on the approved plan being sufficiently
complete. That matches the Plan Mode contract, but it means vague plans
will produce weaker implementation starts than a compacted transcript
would. The upside is that rejected ideas, exploratory dead ends, and
planning corrections do not leak into the implementation turn.

The current implementation stores the latest proposed plan in
`ChatWidget` rather than deriving it from history cells at selection
time. This keeps the popup action simple and deterministic, but it makes
the cache lifecycle important: it must be reset when a new task starts
so an old plan cannot be submitted later.

## Architecture

The TUI stores the most recent completed proposed-plan markdown when a
plan item completes. The Plan Mode approval popup uses that cache to
enable the fresh-context option and to build a first-turn prompt that
instructs the model to implement the approved plan in a fresh context.

Selecting the new option emits a TUI-internal
`ClearUiAndSubmitUserMessage` event. `App` handles that event by reusing
the existing clear flow: clear terminal state, reset app UI state, start
a new app-server thread with `ThreadStartSource::Clear`, and attach a
replacement `ChatWidget` with an initial user message. The existing
initial-message suppression in `enqueue_primary_thread_session` ensures
the prompt is submitted only after the new session is configured and any
startup replay is rendered.

## Observability

The previous thread remains resumable through the existing clear-session
summary hint. There is no new telemetry or protocol event for this path,
so debugging should start at the TUI event boundary: confirm the popup
emitted `ClearUiAndSubmitUserMessage`, confirm the app-server thread
start used `ThreadStartSource::Clear`, then confirm the fresh widget
submitted the initial user message after `SessionConfigured`.

## Tests

The Plan Mode popup snapshots cover the new option and preserve the
original option as the first/default action. Unit coverage verifies the
original same-context option still emits `SubmitUserMessageWithMode`,
the new option emits `ClearUiAndSubmitUserMessage` with the approved
plan embedded verbatim, and the clear-context option is disabled when
Default mode is unavailable or no approved plan exists. The broader
`codex-tui` test package passes with the updated fresh-thread
initial-message plumbing.
2026-04-17 14:30:09 -03:00