With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`
I made sure that `codex exec` correctly kill all the processes
This PR adds the following field to `Config`:
```rust
pub network: Option<NetworkProxy>,
```
Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).
This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes
For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests
Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
This PR makes it possible to disable live web search via an enterprise
config even if the user is running in `--yolo` mode (though cached web
search will still be available). To do this, create
`/etc/codex/requirements.toml` as follows:
```toml
# "live" is not allowed; "disabled" is allowed even though not listed explicitly.
allowed_web_search_modes = ["cached"]
```
Or set `requirements_toml_base64` MDM as explained on
https://developers.openai.com/codex/security/#locations.
### Why
- Enforce admin/MDM/`requirements.toml` constraints on web-search
behavior, independent of user config and per-turn sandbox defaults.
- Ensure per-turn config resolution and review-mode overrides never
crash when constraints are present.
### What
- Add `allowed_web_search_modes` to requirements parsing and surface it
in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
fixtures updated.
- Define a requirements allowlist type (`WebSearchModeRequirement`) and
normalize semantics:
- `disabled` is always implicitly allowed (even if not listed).
- An empty list is treated as `["disabled"]`.
- Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
requirements via `ConstrainedWithSource<WebSearchMode>`.
- Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
- Prefer `Live → Cached → Disabled` when
`SandboxPolicy::DangerFullAccess` is active (subject to requirements),
unless the user preference is explicitly `Disabled`.
- Otherwise, honor the user’s preferred mode, falling back to an allowed
mode when necessary.
- Update TUI `/debug-config` and app-server mapping to display
normalized `allowed_web_search_modes` (including implicit `disabled`).
- Fix web-search integration tests to assert cached behavior under
`SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
`live` when allowed).
## What changed
- In `codex-rs/core/src/skills/injection.rs`, we now honor explicit
`UserInput::Skill { name, path }` first, then fall back to text mentions
only when safe.
- In `codex-rs/tui/src/bottom_pane/chat_composer.rs`, mention selection
is now token-bound (selected mention is tied to the specific inserted
`$token`), and we snapshot bindings at submit time so selection is not
lost.
- In `codex-rs/tui/src/chatwidget.rs` and
`codex-rs/tui/src/bottom_pane/mod.rs`, submit/queue paths now consume
the submit-time mention snapshot (instead of rereading cleared composer
state).
- In `codex-rs/tui/src/mention_codec.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer_history.rs`, history now
round-trips mention targets so resume restores the same selected
duplicate.
- In `codex-rs/tui/src/bottom_pane/skill_popup.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer.rs`, duplicate labels are
normalized to `[Repo]` / `[App]`, app rows no longer show `Connected -`,
and description space is a bit wider.
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 56 56 PM"
src="https://github.com/user-attachments/assets/346a7eb2-a342-4a49-aec8-68dfec0c7d89"
/>
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 57 09 PM"
src="https://github.com/user-attachments/assets/5e04d9af-cccf-4932-98b3-c37183e445ed"
/>
## Before vs now
- Before: selecting a duplicate could still submit the default/repo
match, and resume could lose which duplicate was originally selected.
- Now: the exact selected target (skill path or app id) is preserved
through submit, queue/restore, and resume.
## Manual test
1. Build and run this branch locally:
- `cd /Users/daniels/code/codex/codex-rs`
- `cargo build -p codex-cli --bin codex`
- `./target/debug/codex`
2. Open mention picker with `$` and pick a duplicate entry (not the
first one).
3. Confirm duplicate UI:
- repo duplicate rows show `[Repo]`
- app duplicate rows show `[App]`
- app description does **not** start with `Connected -`
4. Submit the prompt, then press Up to restore draft and submit again.
Expected: it keeps the same selected duplicate target.
5. Use `/resume` to reopen the session and send again.
Expected: restored mention still resolves to the same duplicate target.
## Summary
When replaying compacted history (especially `replacement_history` from
remote compaction), we should not keep stale developer messages from
older session state. This PR trims developer-
role messages from compacted replacement history and reinjects fresh
developer instructions derived from current turn/session state.
This aligns compaction replay behavior with the intended "fresh
instructions after summary" model.
## Problem
Compaction replay had two paths:
- `Compacted { replacement_history: None }`: rebuilt with fresh initial
context
- `Compacted { replacement_history: Some(...) }`: previously used raw
replacement history as-is
The second path could carry stale developer instructions
(permissions/personality/collab-mode guidance) across session changes.
## What Changed
### 1) Added helper to refresh compacted developer instructions
- **File:** `codex-rs/core/src/compact.rs`
- **Function:** `refresh_compacted_developer_instructions(...)`
Behavior:
- remove all `ResponseItem::Message { role: "developer", .. }` from
compacted history
- append fresh developer messages from current
`build_initial_context(...)`
### 2) Applied helper in remote compaction flow
- **File:** `codex-rs/core/src/compact_remote.rs`
- After receiving compact endpoint output, refresh developer
instructions before replacing history and persisting
`replacement_history`.
### 3) Applied helper while reconstructing history from rollout
- **File:** `codex-rs/core/src/codex.rs`
- In `reconstruct_history_from_rollout(...)`, when processing
`Compacted` entries with `replacement_history`, refresh developer
instructions instead of directly replacing with raw history.
## Non-Goals / Follow-up
This PR does **not** address the existing first-turn-after-resume
double-injection behavior.
A follow-up PR will handle resume-time dedup/idempotence separately.
If you want, I can also give you a shorter “squash-merge friendly”
version of the description.
## Codex author
`codex fork 019c25e6-706e-75d1-9198-688ec00a8256`
## Problem
The first user turn can pay websocket handshake latency even when a
session has already started. We want to reduce that initial delay while
preserving turn semantics and avoiding any prompt send during startup.
Reviewer feedback also called out duplicated connect/setup paths and
unnecessary preconnect state complexity.
## Mental model
`ModelClient` owns session-scoped transport state. During session
startup, it can opportunistically warm one websocket handshake slot. A
turn-scoped `ModelClientSession` adopts that slot once if available,
restores captured sticky turn-state, and otherwise opens a websocket
through the same shared connect path.
If startup preconnect is still in flight, first turn setup awaits that
task and treats it as the first connection attempt for the turn.
Preconnect is handshake-only. The first `response.create` is still sent
only when a turn starts.
## Non-goals
This change does not make preconnect required for correctness and does
not change prompt/turn payload semantics. It also does not expand
fallback behavior beyond clearing preconnect state when fallback
activates.
## Tradeoffs
The implementation prioritizes simpler ownership and shared connection
code over header-match gating for reuse. The single-slot cache keeps
lifecycle straightforward but only benefits the immediate next turn.
Awaiting in-flight preconnect has the same app-level connect-timeout
semantics as existing websocket connect behavior (no new timeout class
introduced by this PR).
## Architecture
`core/src/client.rs`:
- Added session-level preconnect lifecycle state (`Idle` / `InFlight` /
`Ready`) carrying one warmed websocket plus optional captured
turn-state.
- Added `pre_establish_connection()` startup warmup and `preconnect()`
handshake-only setup.
- Deduped auth/provider resolution into `current_client_setup()` and
websocket handshake wiring into `connect_websocket()` /
`build_websocket_headers()`.
- Updated turn websocket path to adopt preconnect first, await in-flight
preconnect when present, then create a new websocket only when needed.
- Ensured fallback activation clears warmed preconnect state.
- Added documentation for lifecycle, ownership, sticky-routing
invariants, and timeout semantics.
`core/src/codex.rs`:
- Session startup invokes `model_client.pre_establish_connection(...)`.
- Turn metadata resolution uses the shared timeout helper.
`core/src/turn_metadata.rs`:
- Centralized shared timeout helper used by both turn-time metadata
resolution and startup preconnect metadata building.
`core/tests/common/responses.rs` + websocket test suites:
- Added deterministic handshake waiting helper (`wait_for_handshakes`)
with bounded polling.
- Added startup preconnect and in-flight preconnect reuse coverage.
- Fallback expectations now assert exactly two websocket attempts in
covered scenarios (startup preconnect + turn attempt before fallback
sticks).
## Observability
Preconnect remains best-effort and non-fatal. Existing
websocket/fallback telemetry remains in place, and debug logs now make
preconnect-await behavior and preconnect failures easier to reason
about.
## Tests
Validated with:
1. `just fmt`
2. `cargo test -p codex-core websocket_preconnect -- --nocapture`
3. `cargo test -p codex-core websocket_fallback -- --nocapture`
4. `cargo test -p codex-core
websocket_first_turn_waits_for_inflight_preconnect -- --nocapture`
Summary
- add a `required` flag for MCP servers everywhere config/CLI data is
touched so mandatory helpers can be round-tripped
- have `codex exec` and `codex app-server` thread start/resume fail fast
when required MCPs fail to initialize
###### What
Remove special-casing that prevented auto-enabling `web_search` for
Azure model provider users. Addresses #10071, #10257.
###### Why
Azure fixed their responsesapi implementation; `web_search` is now
supported on models it wasn't before (like `gpt-5.1-codex-max`).
This request now works:
```
curl "$AZURE_API_ENDPOINT" -H "Content-Type: application/json" -H "Authorization: Bearer $AZURE_API_KEY" -d '{
"model": "gpt-5.1-codex-max",
"tools": [
{ "type": "web_search" }
],
"tool_choice": "auto",
"input": "Find the sunrise time in Paris today and cite the source."
}'
```
###### Tests
Tested with above curl, removed Azure-specific tests.
default-enablement of web_search is now client-side, no need to send
eligibility headers to backend.
Tested locally, headers no longer sent.
will wait for corresponding backend change to deploy before merging
So that the rest of the codebase (like TUI) don't need to be concerned
whether ChatGPT auth was handled by Codex itself or passed in via
app-server's external auth mode.
Summary:
- read conversation summaries and cwd info from the state DB when
possible so we no longer rely on rollout files for metadata and avoid
extra I/O
- persist CLI version in thread metadata, surface it through summary
builders, and add the necessary DB migration hooks
- simplify thread listing by using enriched state DB data directly
rather than reading rollout heads
Testing:
- Not run (not requested)
Summary
- refactor user shell command execution into a shared helper and add
modes for standalone vs active-turn execution
- run user shell commands asynchronously when a turn is already active
so they don’t replace or abort the current turn
- extend the tests to cover the new behavior and add the generated Codex
environment manifest
Testing
- Not run (not requested)
## Summary
When resuming with a different model, we should also append a developer
message with the model instructions
## Testing
- [x] Added unit tests
## Summary
This PR fixes a deterministic mismatch in remote compaction where
pre-trim estimation and the `/v1/responses/compact` payload could use
different base instructions.
Before this change:
- pre-trim estimation used model-derived instructions
(`model_info.get_model_instructions(...)`)
- compact payload used session base instructions
(`sess.get_base_instructions()`)
After this change:
- remote pre-trim estimation and compact payload both use the same
`BaseInstructions` instance from session state.
## Changes
- Added a shared estimator entry point in `ContextManager`:
- `estimate_token_count_with_base_instructions(&self, base_instructions:
&BaseInstructions) -> Option<i64>`
- Kept `estimate_token_count(&TurnContext)` as a thin wrapper that
resolves model/personality instructions and delegates to the new helper.
- Updated remote compaction flow to fetch base instructions once and
reuse it for both:
- trim preflight estimation
- compact request payload construction
- Added regression coverage for parity and behavior:
- unit test verifying explicit-base estimator behavior
- integration test proving remote compaction uses session override
instructions and trims accordingly
## Why this matters
This removes a deterministic divergence source where pre-trim could
think the request fits while the actual compact request exceeded context
because its instructions were longer/different.
## Scope
In scope:
- estimator/payload base-instructions parity in remote compaction
Out of scope:
- retry-on-`context_length_exceeded`
- compaction threshold/headroom policy changes
- broader trimming policy changes
## Codex author:
`codex fork 019c2b24-c2df-7b31-a482-fb8cf7a28559`
## Summary
When switching models, we should append the instructions of the new
model to the conversation as a developer message.
## Test
- [x] Adds a unit test
Make ModelClient a session-scoped object.
Move state that is session level onto the client, and make state that is
per-turn explicit on corresponding methods.
Stop taking a huge Config object, instead only pass in values that are
actually needed.
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397
Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.
## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).
Old shape:
`{ "output": "dynamic-ok", "success": true }`
New shape:
```
{
"contentItems": [
{ "type": "inputText", "text": "dynamic-ok" },
{ "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
]
"success": true
}
```
Add a centralized FileWatcher in codex-core (using notify) that watches
skill roots from the config layer stack (recursive)
Send `SkillsChanged` events when relevant file system changes are
detected
On `SkillsChanged`:
* Invalidate the skills cache immediately in ThreadManager
* Emit EventMsg::SkillsUpdateAvailable to active sessions
~~* Broadcast a new app-server notification:
SkillsListUpdatedNotification~~
This change does not inject new items into the event stream. That means
the agent will not know about new skills, so it won't be able to
implicitly invoke new skills. It also won't know about changes to
existing skills, so if it has already read the contents of a modified
skill, it will not honor the new behavior.
This change also does not detect modifications to AGENTS.md.
I plan to address these limitations in a follow-on PR modeled after
#9985. Injection of new skills and AGENTS was deemed to risky, hence the
need to split the feature into two stages. The changes in this PR were
designed to easily accommodate the second stage once we have some other
foundational changes in place.
Testing: In addition to automated tests, I did manual testing to confirm
that newly-created skills, deleted skills, and renamed skills are
reflected in the TUI skill picker menu. Also confirmed that
modifications to behaviors for explicitly-invoked skills are honored.
---------
Co-authored-by: Xin Lin <xl@openai.com>
## Summary
This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
curent Linux sandbox path relies on in-process restrictions (including
Landlock). Bubblewrap gives us a more uniform filesystem isolation
model, especially explicit writable roots with the option to make some
directories read-only and granular network controls.
This is behind a feature flag so we can validate behavior safely before
making it the default.
- Added temporary rollout flag:
- `features.use_linux_sandbox_bwrap`
- Preserved existing default path when the flag is off.
- In Bubblewrap mode:
- Added internal retry without /proc when /proc mount is not permitted
by the host/container.
Summary:
- replace the `sse_completed` fixture and related JSON template with
direct `responses::ev_completed` payload builders
- cascade the new SSE helpers through all affected core tests for
consistency and clarity
- remove legacy fixtures that were no longer needed once the helpers are
in place
Testing:
- Not run (not requested)
Summary
- add versioned state sqlite filename helpers and re-export them from
the state crate
- remove legacy state files when initializing the runtime and update
consumers/tests to use the new helpers
- tweak logs client description and database resolution to match the new
path
When communicating over websockets, we can't rely on headers to deliver
rate limit information. This PR adds a `codex.rate_limits` event that
the server can pass to the client to inform them about rate limit usage.
The client parses this data the same way we parse rate limit headers in
HTTP mode.
This PR also wires up the etag and reasoning headers for websockets
## Description
### What changed
- Switch the arg0 helper root from `~/.codex/tmp/path` to
`~/.codex/tmp/path2`
- Add `Arg0PathEntryGuard` to keep both the `TempDir` and an exclusive
`.lock` file alive for the process lifetime
- Add a startup janitor that scans `path2` and deletes only directories
whose lock can be acquired
### Tests
- `cargo clippy -p codex-arg0`
- `cargo clippy -p codex-core`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core`
## Summary
This PR updates `request_user_input` behavior and Default-mode guidance
to match current collaboration-mode semantics and reduce model
confusion.
## Why
- `request_user_input` should be explicitly documented as **Plan-only**.
- Tool description and runtime availability checks should be driven by
the **same centralized mode policy**.
- Default mode prompt needed stronger execution guidance and explicit
instruction that `request_user_input` is unavailable.
- Error messages should report the **actual mode name** (not aliases
that can read as misleading).
## What changed
- Centralized `request_user_input` mode policy in `core` handler logic:
- Added a single allowed-modes config (`Plan` only).
- Reused that policy for:
- runtime rejection messaging
- tool description text
- Updated tool description to include availability constraint:
- `"This tool is only available in Plan mode."`
- Updated runtime rejection behavior:
- `Default` -> `"request_user_input is unavailable in Default mode"`
- `Execute` -> `"request_user_input is unavailable in Execute mode"`
- `PairProgramming` -> `"request_user_input is unavailable in Pair
Programming mode"`
- Strengthened Default collaboration prompt:
- Added explicit execution-first behavior
- Added assumptions-first guidance
- Added explicit `request_user_input` unavailability instruction
- Added concise progress-reporting expectations
- Simplified formatting implementation:
- Inlined allowed-mode name collection into `format_allowed_modes()`
- Kept `format_allowed_modes()` output for 3+ modes as CSV style
(`modes: a,b,c`)
Summary
- mark the shell-related tools as supporting parallel tool calls so
exec_command, shell_command, etc. can run concurrently
- update expectations in tool parallelism tests to reflect the new
parallel behavior
- drop the unused serial duration helper from the suite
Testing
- Not run (not requested)
## Summary
This PR simplifies collaboration modes to the visible set `default |
plan`, while preserving backward compatibility for older partners that
may still send legacy mode
names.
Specifically:
- Renames the old Code behavior to **Default**.
- Keeps **Plan** as-is.
- Removes **Custom** mode behavior (fallbacks now resolve to Default).
- Keeps `PairProgramming` and `Execute` internally for compatibility
plumbing, while removing them from schema/API and UI visibility.
- Adds legacy input aliasing so older clients can still send old mode
names.
## What Changed
1. Mode enum and compatibility
- `ModeKind` now uses `Plan` + `Default` as active/public modes.
- `ModeKind::Default` deserialization accepts legacy values:
- `code`
- `pair_programming`
- `execute`
- `custom`
- `PairProgramming` and `Execute` variants remain in code but are hidden
from protocol/schema generation.
- `Custom` variant is removed; previous custom fallbacks now map to
`Default`.
2. Collaboration presets and templates
- Built-in presets now return only:
- `Plan`
- `Default`
- Template rename:
- `core/templates/collaboration_mode/code.md` -> `default.md`
- `execute.md` and `pair_programming.md` remain on disk but are not
surfaced in visible preset lists.
3. TUI updates
- Updated user-facing naming and prompts from “Code” to “Default”.
- Updated mode-cycle and indicator behavior to reflect only visible
`Plan` and `Default`.
- Updated corresponding tests and snapshots.
4. request_user_input behavior
- `request_user_input` remains allowed only in `Plan` mode.
- Rejection messaging now consistently treats non-plan modes as
`Default`.
5. Schemas
- Regenerated config and app-server schemas.
- Public schema types now advertise mode values as:
- `plan`
- `default`
## Backward Compatibility Notes
- Incoming legacy mode names (`code`, `pair_programming`, `execute`,
`custom`) are accepted and coerced to `default`.
- Outgoing/public schema surfaces intentionally expose only `plan |
default`.
- This allows tolerant ingestion of older partner payloads while
standardizing new integrations on the reduced mode set.
## Codex author
`codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
Two fixes:
1. Include trailing tool output in the total context size calculation.
Otherwise when checking whether compaction should run we ignore newly
added outputs.
2. Trim trailing tool output/tool calls until we can fit the request
into the model context size. Otherwise the compaction endpoint will fail
to compact. We only trim items that can be reproduced again by the model
(tool calls, tool call outputs).