## Why
Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.
This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.
It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.
The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.
## What Changed
Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:
```toml
[features]
personality = true
unified_exec = false
```
Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.
- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.
## Verification
- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.
## Docs
`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
## Summary
This PR includes the session's local date and timezone in the
model-visible environment context and persists that data in
`TurnContextItem`.
## What changed
- captures the current local date and IANA timezone when building a turn
context, with a UTC fallback if the timezone lookup fails
- includes current_date and timezone in the serialized
<environment_context> payload
- stores those fields on TurnContextItem so they survive rollout/history
handling, subagent review threads, and resume flows
- treats date/timezone changes as environment updates, so prompt caching
and context refresh logic do not silently reuse stale time context
- updates tests to validate the new environment fields without depending
on a single hardcoded environment-context string
## test
built a local build and saw it in the rollout file:
```
{"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
```
## Summary
- make resume/fork targets explicit and typed as `SessionTarget { path,
thread_id }` (non-optional `thread_id`)
- resolve `thread_id` centrally via `resolve_session_thread_id(...)`:
- use CLI input directly when it is a UUID (`--resume <uuid>` / `--fork
<uuid>`)
- otherwise read `thread_id` from rollout `SessionMeta` for path-based
selections (picker, `--resume-last`, name-based resume/fork)
- use `thread_id` to read cwd from SQLite first during resume/fork cwd
resolution
- keep rollout fallback for cwd resolution when SQLite is unavailable or
does not return thread metadata (`TurnContext` tail, then `SessionMeta`)
- keep the resume picker open when a selected row has unreadable session
metadata, and show an inline recoverable error instead of aborting the
TUI
## Why
This removes ad-hoc rollout filename parsing and makes resume/fork
target identity explicit. The resume/fork cwd check can use indexed
SQLite lookup by `thread_id` in the common path, while preserving
rollout-based fallback behavior. It also keeps malformed legacy rows
recoverable in the picker instead of letting a selection failure unwind
the app.
## Notes
- minimal TUI-only change; no schema/protocol changes
- includes TUI test coverage for SQLite cwd precedence when `thread_id`
is available
- includes TUI regression coverage for picker inline error rendering /
non-fatal unreadable session rows
## Codex author
`codex resume 019c9205-7f8b-7173-a2a2-f082d4df3de3`
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata
## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously
located `codex-execve-wrapper` by scanning `PATH` and sibling
directories. That lookup is brittle and can select the wrong binary when
the runtime environment differs from startup assumptions.
We already pass `codex-linux-sandbox` from `codex-arg0`;
`codex-execve-wrapper` should use the same startup-driven path plumbing.
## What changed
- Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper
executable paths:
- `codex_linux_sandbox_exe`
- `main_execve_wrapper_exe`
- Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to
top-level binaries and preserve helper paths created in
`prepend_path_entry_for_codex_aliases()`.
- Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`,
`tui`, `app-server`, and `mcp-server`.
- Added `main_execve_wrapper_exe` to core configuration plumbing
(`Config`, `ConfigOverrides`, and `SessionServices`).
- Updated zsh-fork shell escalation to consume the configured
`main_execve_wrapper_exe` and removed path-sniffing fallback logic.
- Updated app-server config reload paths so reloaded configs keep the
same startup-provided helper executable paths.
## References
- [`Arg0DispatchPaths`
definition](e355b43d5c/codex-rs/arg0/src/lib.rs (L20-L24))
- [`arg0_dispatch_or_else()` forwarding both
paths](e355b43d5c/codex-rs/arg0/src/lib.rs (L145-L176))
- [zsh-fork escalation using configured wrapper
path](e355b43d5c/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L109-L150))
## Testing
- `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p
codex-mcp-server -p codex-app-server`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core tools::runtimes::shell::unix_escalation:: --
--nocapture`
- Add a hidden `realtime_conversation` feature flag and `/realtime`
slash command for start/stop live voice sessions.
- Reuse transcription composer/footer UI for live metering, stream mic
audio, play assistant audio, render realtime user text events, and
force-close on feature disable.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Adds syntax highlighting to the TUI for fenced code blocks in markdown
responses and file diffs, plus a `/theme` command with live preview and
persistent theme selection. Uses syntect (~250 grammars, 32 bundled
themes, ~1 MB binary cost) — the same engine behind `bat`, `delta`, and
`xi-editor`. Includes guardrails for large inputs, graceful fallback to
plain text, and SSH-aware clipboard integration for the `/copy` command.
<img width="1554" height="1014" alt="image"
src="https://github.com/user-attachments/assets/38737a79-8717-4715-b857-94cf1ba59b85"
/>
<img width="2354" height="1374" alt="image"
src="https://github.com/user-attachments/assets/25d30a00-c487-4af8-9cb6-63b0695a4be7"
/>
## Problem
Code blocks in the TUI (markdown responses and file diffs) render
without syntax highlighting, making it hard to scan code at a glance.
Users also have no way to pick a color theme that matches their terminal
aesthetic.
## Mental model
The highlighting system has three layers:
1. **Syntax engine** (`render::highlight`) -- a thin wrapper around
syntect + two-face. It owns a process-global `SyntaxSet` (~250 grammars)
and a `RwLock<Theme>` that can be swapped at runtime. All public entry
points accept `(code, lang)` and return ratatui `Span`/`Line` vectors or
`None` when the language is unrecognized or the input exceeds safety
guardrails.
2. **Rendering consumers** -- `markdown_render` feeds fenced code blocks
through the engine; `diff_render` highlights Add/Delete content as a
whole file and Update hunks per-hunk (preserving parser state across
hunk lines). Both callers fall back to plain unstyled text when the
engine returns `None`.
3. **Theme lifecycle** -- at startup the config's `tui.theme` is
resolved to a syntect `Theme` via `set_theme_override`. At runtime the
`/theme` picker calls `set_syntax_theme` to swap themes live; on cancel
it restores the snapshot taken at open. On confirm it persists `[tui]
theme = "..."` to config.toml.
## Non-goals
- Inline diff highlighting (word-level change detection within a line).
- Semantic / LSP-backed highlighting.
- Theme authoring tooling; users supply standard `.tmTheme` files.
## Tradeoffs
| Decision | Upside | Downside |
| ------------------------------------------------ |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------
|
| syntect over tree-sitter / arborium | ~1 MB binary increase for ~250
grammars + 32 themes; battle-tested crate powering widely-used tools
(`bat`, `delta`, `xi-editor`). tree-sitter would add ~12 MB for 20-30
languages or ~35 MB for full coverage. | Regex-based; less structurally
accurate than tree-sitter for some languages (e.g. language injections
like JS-in-HTML). |
| Global `RwLock<Theme>` | Enables live `/theme` preview without
threading Theme through every call site | Lock contention risk
(mitigated: reads vastly outnumber writes, single UI thread) |
| Skip background / italic / underline from themes | Terminal BG
preserved, avoids ugly rendering on some themes | Themes that rely on
these properties lose fidelity |
| Guardrails: 512 KB / 10k lines | Prevents pathological stalls on huge
diffs or pastes | Very large files render without color |
## Architecture
```
config.toml ─[tui.theme]─> set_theme_override() ─> THEME (RwLock)
│
┌───────────────────────────────────────────┘
│
markdown_render ─── highlight_code_to_lines(code, lang) ─> Vec<Line>
diff_render ─── highlight_code_to_styled_spans(code, lang) ─> Option<Vec<Vec<Span>>>
│
│ (None ⇒ plain text fallback)
│
/theme picker ─── set_syntax_theme(theme) // live preview swap
─── current_syntax_theme() // snapshot for cancel
─── resolve_theme_by_name(name) // lookup by kebab-case
```
Key files:
- `tui/src/render/highlight.rs` -- engine, theme management, guardrails
- `tui/src/diff_render.rs` -- syntax-aware diff line wrapping
- `tui/src/theme_picker.rs` -- `/theme` command builder
- `tui/src/bottom_pane/list_selection_view.rs` -- side content panel,
callbacks
- `core/src/config/types.rs` -- `Tui::theme` field
- `core/src/config/edit.rs` -- `syntax_theme_edit()` helper
## Observability
- `tracing::warn` when a configured theme name cannot be resolved.
- `Config::startup_warnings` surfaces the same message as a TUI banner.
- `tracing::error` when persisting theme selection fails.
## Tests
- Unit tests in `highlight.rs`: language coverage, fallback behavior,
CRLF stripping, style conversion, guardrail enforcement, theme name
mapping exhaustiveness.
- Unit tests in `diff_render.rs`: snapshot gallery at multiple terminal
sizes (80x24, 94x35, 120x40), syntax-highlighted wrapping, large-diff
guardrail, rename-to-different-extension highlighting, parser state
preservation across hunk lines.
- Unit tests in `theme_picker.rs`: preview rendering (wide + narrow),
dim overlay on deletions, subtitle truncation, cancel-restore, fallback
for unavailable configured theme.
- Unit tests in `list_selection_view.rs`: side layout geometry, stacked
fallback, buffer clearing, cancel/selection-changed callbacks.
- Integration test in `lib.rs`: theme warning uses the final
(post-resume) config.
## Cargo Deny: Unmaintained Dependency Exceptions
This PR adds two `cargo deny` advisory exceptions for transitive
dependencies pulled in by `syntect v5.3.0`:
| Advisory | Crate | Status |
|----------|-------|--------|
| RUSTSEC-2024-0320 | `yaml-rust` | Unmaintained (maintainer
unreachable) |
| RUSTSEC-2025-0141 | `bincode` | Unmaintained (development ceased;
v1.3.3 considered complete) |
**Why this is safe in our usage:**
- Neither advisory describes a known security vulnerability. Both are
"unmaintained" notices only.
- `bincode` is used by syntect to deserialize pre-compiled syntax sets.
Again, these are **static vendored artifacts** baked into the binary at
build time. No user-supplied bincode data is ever deserialized. - Attack
surface is zero for both crates; exploitation would require a
supply-chain compromise of our own build artifacts.
- These exceptions can be removed when syntect migrates to `yaml-rust2`
and drops `bincode`, or when alternative crates are available upstream.
## Why
`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.
This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.
## What Changed
- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
- `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
- `codex_protocol::protocol`
- `codex_protocol::config_types`
- `codex_protocol::models`
- `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
- `codex-utils-approval-presets`
- `codex-utils-cli`
## Verification
- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
## Summary
- change the cwd-change prompt (shown when resuming/forking across
different directories) so `Ctrl+C`/`Ctrl+D` exits the session instead of
implicitly selecting "Use session directory"
- introduce explicit prompt and resolver exit outcomes so this intent is
propagated cleanly through both startup resume/fork and in-app `/resume`
flows
- add a unit test that verifies `Ctrl+C` exits rather than selecting an
option
## Why
Previously, pressing `Ctrl+C` on this prompt silently picked one of the
options, which made it hard to abort. This aligns the prompt with the
expected quit behavior.
## Codex author
`codex resume 019c6d39-bbfb-7dc3-8008-1388a054e86d`
Summary
- rename the `collab` handlers and UI files to `multi_agents` to match
the new naming
- update module references and specs so the handlers and TUI widgets
consistently use the renamed files
- keep the existing functionality while aligning file and module names
with the multi-agent terminology
Currently, if there are syntax errors detected in the starlark rules
file, the entire policy is silently ignored by the CLI. The app server
correctly emits a message that can be displayed in a GUI.
This PR changes the CLI (both the TUI and non-interactive exec) to fail
when the rules file can't be parsed. It then prints out an error message
and exits with a non-zero exit code. This is consistent with the
handling of errors in the config file.
This addresses #11603
## Summary
This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.
Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.
No context injection/diff behavior changes are included in this PR.
## Why this change
The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.
This PR addresses both with a minimal, reviewable change.
## Changes
### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.
Files:
- `codex-rs/protocol/src/protocol.rs`
### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.
Files:
- `codex-rs/core/src/codex.rs`
### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
- sampling request path
- compaction path
Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`
### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
- deserializing old payloads with no `network`
- serializing when `network` is present
Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.
## Why
We currently carry multiple permission-related concepts directly on
`Config` for shell/unified-exec behavior (`approval_policy`,
`sandbox_policy`, `network`, `shell_environment_policy`,
`windows_sandbox_mode`).
Consolidating these into one in-memory struct makes permission handling
easier to reason about and sets up the next step: supporting named
permission profiles (`[permissions.PROFILE_NAME]`) without changing
behavior now.
This change is mostly mechanical: it updates existing callsites to go
through `config.permissions`, but it does not yet refactor those
callsites to take a single `Permissions` value in places where multiple
permission fields are still threaded separately.
This PR intentionally **does not** change the on-disk `config.toml`
format yet and keeps compatibility with legacy config keys.
## What Changed
- Introduced `Permissions` in `core/src/config/mod.rs`.
- Added `Config::permissions` and moved effective runtime permission
fields under it:
- `approval_policy`
- `sandbox_policy`
- `network`
- `shell_environment_policy`
- `windows_sandbox_mode`
- Updated config loading/building so these effective values are still
derived from the same existing config inputs and constraints.
- Updated Windows sandbox helpers/resolution to read/write via
`permissions`.
- Threaded the new field through all permission consumers across core
runtime, app-server, CLI/exec, TUI, and sandbox summary code.
- Updated affected tests to reference `config.permissions.*`.
- Renamed the struct/field from
`EffectivePermissions`/`effective_permissions` to
`Permissions`/`permissions` and aligned variable naming accordingly.
## Verification
- `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
-p codex-exec -p codex-utils-sandbox-summary`
- `cargo build -p codex-core -p codex-tui -p codex-cli -p
codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
1. Move Windows Sandbox NUX to right after trust directory screen
2. Don't offer read-only as an option in Sandbox NUX.
Elevated/Legacy/Quit
3. Don't allow new untrusted directories. It's trust or quit
4. move experimental sandbox features to `[windows]
sandbox="elevated|unelevatd"`
5. Copy tweaks = elevated -> default, non-elevated -> non-admin
We're loading these from the web on every startup. This puts them in a
local file with a 1hr TTL.
We sign the downloaded requirements with a key compiled into the Codex
CLI to prevent unsophisticated tampering (determined circumvention is
outside of our threat model: after all, one could just compile Codex
without any of these checks).
If any of the following are true, we ignore the local cache and re-fetch
from Cloud:
* The signature is invalid for the payload (== requirements, sign time,
ttl, user identity)
* The identity does not match the auth'd user's identity
* The TTL has expired
* We cannot parse requirements.toml from the payload
We are removing feature-gated shared crates from the `codex-rs`
workspace. `codex-common` grouped several unrelated utilities behind
`[features]`, which made dependency boundaries harder to reason about
and worked against the ongoing effort to eliminate feature flags from
workspace crates.
Splitting these utilities into dedicated crates under `utils/` aligns
this area with existing workspace structure and keeps each dependency
explicit at the crate boundary.
## What changed
- Removed `codex-rs/common` (`codex-common`) from workspace members and
workspace dependencies.
- Added six new utility crates under `codex-rs/utils/`:
- `codex-utils-cli`
- `codex-utils-elapsed`
- `codex-utils-sandbox-summary`
- `codex-utils-approval-presets`
- `codex-utils-oss`
- `codex-utils-fuzzy-match`
- Migrated the corresponding modules out of `codex-common` into these
crates (with tests), and added matching `BUILD.bazel` targets.
- Updated direct consumers to use the new crates instead of
`codex-common`:
- `codex-rs/cli`
- `codex-rs/tui`
- `codex-rs/exec`
- `codex-rs/app-server`
- `codex-rs/mcp-server`
- `codex-rs/chatgpt`
- `codex-rs/cloud-tasks`
- Updated workspace lockfile entries to reflect the new dependency graph
and removal of `codex-common`.
Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.
This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.
This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.
example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```
backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.
## What changed
- In `codex-rs/core/src/skills/injection.rs`, we now honor explicit
`UserInput::Skill { name, path }` first, then fall back to text mentions
only when safe.
- In `codex-rs/tui/src/bottom_pane/chat_composer.rs`, mention selection
is now token-bound (selected mention is tied to the specific inserted
`$token`), and we snapshot bindings at submit time so selection is not
lost.
- In `codex-rs/tui/src/chatwidget.rs` and
`codex-rs/tui/src/bottom_pane/mod.rs`, submit/queue paths now consume
the submit-time mention snapshot (instead of rereading cleared composer
state).
- In `codex-rs/tui/src/mention_codec.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer_history.rs`, history now
round-trips mention targets so resume restores the same selected
duplicate.
- In `codex-rs/tui/src/bottom_pane/skill_popup.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer.rs`, duplicate labels are
normalized to `[Repo]` / `[App]`, app rows no longer show `Connected -`,
and description space is a bit wider.
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 56 56 PM"
src="https://github.com/user-attachments/assets/346a7eb2-a342-4a49-aec8-68dfec0c7d89"
/>
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 57 09 PM"
src="https://github.com/user-attachments/assets/5e04d9af-cccf-4932-98b3-c37183e445ed"
/>
## Before vs now
- Before: selecting a duplicate could still submit the default/repo
match, and resume could lose which duplicate was originally selected.
- Now: the exact selected target (skill path or app id) is preserved
through submit, queue/restore, and resume.
## Manual test
1. Build and run this branch locally:
- `cd /Users/daniels/code/codex/codex-rs`
- `cargo build -p codex-cli --bin codex`
- `./target/debug/codex`
2. Open mention picker with `$` and pick a duplicate entry (not the
first one).
3. Confirm duplicate UI:
- repo duplicate rows show `[Repo]`
- app duplicate rows show `[App]`
- app description does **not** start with `Connected -`
4. Submit the prompt, then press Up to restore draft and submit again.
Expected: it keeps the same selected duplicate target.
5. Use `/resume` to reopen the session and send again.
Expected: restored mention still resolves to the same duplicate target.
So that the rest of the codebase (like TUI) don't need to be concerned
whether ChatGPT auth was handled by Codex itself or passed in via
app-server's external auth mode.
This PR addresses #10395
When a user is asked to pick the trust level of a project, the code
currently reloads the config if they select "trusted". It doesn't reload
the config in the "untrusted" case but should. This causes the sandbox
mode to be reported incorrectly in `/status` during the first run (it's
displayed as `read-only` even though it acts as though it's
`workspace-write`).
Previously, `CodexAuth` was defined as follows:
d550fbf41a/codex-rs/core/src/auth.rs (L39-L46)
But if you looked at its constructors, we had creation for
`AuthMode::ApiKey` where `storage` was built using a nonsensical path
(`PathBuf::new()`) and `auth_dot_json` was `None`:
d550fbf41a/codex-rs/core/src/auth.rs (L212-L220)
By comparison, when `AuthMode::ChatGPT` was used, `api_key` was always
`None`:
d550fbf41a/codex-rs/core/src/auth.rs (L665-L671)https://github.com/openai/codex/pull/10012 took things further because
it introduced a new `ChatgptAuthTokens` variant to `AuthMode`, which is
important in when invoking `account/login/start` via the app server, but
most logic _internal_ to the app server should just reason about two
`AuthMode` variants: `ApiKey` and `ChatGPT`.
This PR tries to clean things up as follows:
- `LoginAccountParams` and `AuthMode` in `codex-rs/app-server-protocol/`
both continue to have the `ChatgptAuthTokens` variant, though it is used
exclusively for the on-the-wire messaging.
- `codex-rs/core/src/auth.rs` now has its own `AuthMode` enum, which
only has two variants: `ApiKey` and `ChatGPT`.
- `CodexAuth` has been changed from a struct to an enum. It is a
disjoint union where each variant (`ApiKey`, `ChatGpt`, and
`ChatGptAuthTokens`) have only the associated fields that make sense for
that variant.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10208).
* #10224
* __->__ #10208
Load requirements from Codex Backend. It only does this for enterprise
customers signed in with ChatGPT.
Todo in follow-up PRs:
* Add to app-server and exec too
* Switch from fail-open to fail-closed on failure
Session renaming:
- `/rename my_session`
- `/rename` without arg and passing an argument in `customViewPrompt`
- AppExitInfo shows resume hint using the session name if set instead of
uuid, defaults to uuid if not set
- Names are stored in `CODEX_HOME/sessions.jsonl`
Session resuming:
- codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry
matching the name and resumes the session
---------
Co-authored-by: jif-oai <jif@openai.com>
# Summary
- Fix resume/fork config rebuild so cwd changes inside the TUI produce a
fully rebuilt Config (trust/approval/sandbox) instead of mutating only
the cwd.
- Preserve `--add-dir` behavior across resume/fork by normalizing
relative roots to absolute paths once (based on the original cwd).
- Prefer latest `TurnContext.cwd` for resume/fork prompts but fall back
to `SessionMeta.cwd` if the latest cwd no longer exists.
- Align resume/fork selection handling and ensure UI config matches the
resumed thread config.
- Fix Windows test TOML path escaping in trust-level test.
# Details
- Rebuild Config via `ConfigBuilder` when resuming into a different cwd;
carry forward runtime approval/sandbox overrides.
- Add `normalize_harness_overrides_for_cwd` to resolve relative
`additional_writable_roots` against the initial cwd before reuse.
- Guard `read_session_cwd` with filesystem existence check for the
latest `TurnContext.cwd`.
- Update naming/flow around cwd comparison and prompt selection.
<img width="603" height="150" alt="Screenshot 2026-01-23 at 5 42 13 PM"
src="https://github.com/user-attachments/assets/d1897386-bb28-4e8a-98cf-187fdebbecb0"
/>
And proof the model understands the new cwd:
<img width="828" height="353" alt="Screenshot 2026-01-22 at 5 36 45 PM"
src="https://github.com/user-attachments/assets/12aed8ca-dec3-4b64-8dae-c6b8cff78387"
/>
In a [recent PR](https://github.com/openai/codex/pull/9182), I made some
improvements to config error messages so errors didn't leave app server
clients in a dead state. This is a follow-on PR to make these error
messages more readable and actionable for both TUI and GUI users. For
example, see #9668 where the user was understandably confused about the
source of the problem and how to fix it.
The improved error message:
1. Clearly identifies the config file where the error was found (which
is more important now that we support layered configs)
2. Provides a line and column number of the error
3. Displays the line where the error occurred and underlines it
For example, if my `config.toml` includes the following:
```toml
[features]
collaboration_modes = "true"
```
Here's the current CLI error message:
```
Error loading config.toml: invalid type: string "true", expected a boolean in `features`
```
And here's the improved message:
```
Error loading config.toml:
/Users/etraut/.codex/config.toml:43:23: invalid type: string "true", expected a boolean
|
43 | collaboration_modes = "true"
| ^^^^^^
```
The bulk of the new logic is contained within a new module
`config_loader/diagnostics.rs` that is responsible for calculating the
text range for a given toml path (which is more involved than I would
have expected).
In addition, this PR adds the file name and text range to the
`ConfigWarningNotification` app server struct. This allows GUI clients
to present the user with a better error message and an optional link to
open the errant config file. This was a suggestion from @.bolinfest when
he reviewed my previous PR.
This PR fixes a small issue with chained (layered) config.toml file
merging. The old logic didn't properly handle profiles.
In particular, if a lower-layer config overrides a profile defined in a
higher-layer config, the override did not take effect. This prevents
users from having project-specific profile overrides and contradicts the
(soon-to-be) documented behavior of config merging.
The change adds a unit test for this case. It also exposes a function
from the config crate that is needed by the app server code paths to
implement support for layered configs.
Add support for returning threads by either `created_at` OR `updated_at`
descending. Previously core always returned threads ordered by
`created_at`.
This PR:
- updates core to be able to list threads by `updated_at` OR
`created_at` descending based on what the caller wants
- also update `thread/list` in app-server to expose this (default to
`created_at` if not specified)
All existing codepaths (app-server, TUI) still default to `created_at`,
so no behavior change is expected with this PR.
**Implementation**
To sort by `updated_at` is a bit nontrivial (whereas `created_at` is
easy due to the way we structure the folders and filenames on disk,
which are all based on `created_at`).
The most naive way to do this without introducing a cache file or sqlite
DB (which we have to implement/maintain) is to scan files in reverse
`created_at` order on disk, and look at the file's mtime (last modified
timestamp according to the filesystem) until we reach `MAX_SCAN_FILES`
(currently set to 10,000). Then, we can return the most recent N
threads.
Based on some quick and dirty benchmarking on my machine with ~1000
rollout files, calling `thread/list` with limit 50, the `updated_at`
path is slower as expected due to all the I/O:
- updated-at: average 103.10 ms
- created-at: average 41.10 ms
Those absolute numbers aren't a big deal IMO, but we can certainly
optimize this in a followup if needed by introducing more state stored
on disk.
**Caveat**
There's also a limitation in that any files older than `MAX_SCAN_FILES`
will be excluded, which means if a user continues a REALLY old thread,
it's possible to not be included. In practice that should not be too big
of an issue.
If a user makes...
- 1000 rollouts/day → threads older than 10 days won't show up
- 100 rollouts/day → ~100 days
If this becomes a problem for some reason, even more motivation to
implement an updated_at cache.
This PR changes `codex resume --last` to work consistently with `codex
resume`. Namely, it filters based on the cwd when selecting the last
session. It also supports the `--all` modifier as an override.
This addresses #8700
### What
Add `WebSearchMode` enum (disabled, cached live, defaults to cached) to
config + V2 protocol. This enum takes precedence over legacy flags:
`web_search_cached`, `web_search_request`, and `tools.web_search`.
Keep `--search` as live.
### Tests
Added tests
When an invalid config.toml key or value is detected, the CLI currently
just quits. This leaves the VSCE in a dead state.
This PR changes the behavior to not quit and bubble up the config error
to users to make it actionable. It also surfaces errors related to
"rules" parsing.
This allows us to surface these errors to users in the VSCE, like this:
<img width="342" height="129" alt="Screenshot 2026-01-13 at 4 29 22 PM"
src="https://github.com/user-attachments/assets/a79ffbe7-7604-400c-a304-c5165b6eebc4"
/>
<img width="346" height="244" alt="Screenshot 2026-01-13 at 4 45 06 PM"
src="https://github.com/user-attachments/assets/de874f7c-16a2-4a95-8c6d-15f10482e67b"
/>