Commit Graph

547 Commits

Author SHA1 Message Date
jif-oai
6cf61725d0 feat: do not close unified exec processes across turns (#10799)
With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`

I made sure that `codex exec` correctly kill all the processes
2026-02-09 10:27:46 +00:00
Michael Bolin
383b45279e feat: include NetworkConfig through ExecParams (#11105)
This PR adds the following field to `Config`:

```rust
pub network: Option<NetworkProxy>,
```

Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).

This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
2026-02-09 03:32:17 +00:00
Eric Traut
b3de6c7f2b Defer persistence of rollout file (#11028)
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes

For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests

Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
2026-02-07 23:05:03 -08:00
pakrym-oai
6d08298f4e Fallback to HTTP on UPGRADE_REQUIRED (#10824)
Allow the server to trigger a connection downgrade in case the protocol
changes in incompatible ways.
2026-02-08 05:06:33 +00:00
pakrym-oai
8fe5066bcc Simplify pre-connect (#11040) 2026-02-07 15:52:03 -08:00
Michael Bolin
a118494323 feat: add support for allowed_web_search_modes in requirements.toml (#10964)
This PR makes it possible to disable live web search via an enterprise
config even if the user is running in `--yolo` mode (though cached web
search will still be available). To do this, create
`/etc/codex/requirements.toml` as follows:

```toml
# "live" is not allowed; "disabled" is allowed even though not listed explicitly.
allowed_web_search_modes = ["cached"]
```

Or set `requirements_toml_base64` MDM as explained on
https://developers.openai.com/codex/security/#locations.

### Why
- Enforce admin/MDM/`requirements.toml` constraints on web-search
behavior, independent of user config and per-turn sandbox defaults.
- Ensure per-turn config resolution and review-mode overrides never
crash when constraints are present.

### What
- Add `allowed_web_search_modes` to requirements parsing and surface it
in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
fixtures updated.
- Define a requirements allowlist type (`WebSearchModeRequirement`) and
normalize semantics:
  - `disabled` is always implicitly allowed (even if not listed).
  - An empty list is treated as `["disabled"]`.
- Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
requirements via `ConstrainedWithSource<WebSearchMode>`.
- Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
- Prefer `Live → Cached → Disabled` when
`SandboxPolicy::DangerFullAccess` is active (subject to requirements),
unless the user preference is explicitly `Disabled`.
- Otherwise, honor the user’s preferred mode, falling back to an allowed
mode when necessary.
- Update TUI `/debug-config` and app-server mapping to display
normalized `allowed_web_search_modes` (including implicit `disabled`).
- Fix web-search integration tests to assert cached behavior under
`SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
`live` when allowed).
2026-02-07 05:55:15 +00:00
daniel-oai
84bce2b8e6 TUI/Core: preserve duplicate skill/app mention selection across submit + resume (#10855)
## What changed

- In `codex-rs/core/src/skills/injection.rs`, we now honor explicit
`UserInput::Skill { name, path }` first, then fall back to text mentions
only when safe.
- In `codex-rs/tui/src/bottom_pane/chat_composer.rs`, mention selection
is now token-bound (selected mention is tied to the specific inserted
`$token`), and we snapshot bindings at submit time so selection is not
lost.
- In `codex-rs/tui/src/chatwidget.rs` and
`codex-rs/tui/src/bottom_pane/mod.rs`, submit/queue paths now consume
the submit-time mention snapshot (instead of rereading cleared composer
state).
- In `codex-rs/tui/src/mention_codec.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer_history.rs`, history now
round-trips mention targets so resume restores the same selected
duplicate.
- In `codex-rs/tui/src/bottom_pane/skill_popup.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer.rs`, duplicate labels are
normalized to `[Repo]` / `[App]`, app rows no longer show `Connected -`,
and description space is a bit wider.

<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 56 56 PM"
src="https://github.com/user-attachments/assets/346a7eb2-a342-4a49-aec8-68dfec0c7d89"
/>
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 57 09 PM"
src="https://github.com/user-attachments/assets/5e04d9af-cccf-4932-98b3-c37183e445ed"
/>


## Before vs now

- Before: selecting a duplicate could still submit the default/repo
match, and resume could lose which duplicate was originally selected.
- Now: the exact selected target (skill path or app id) is preserved
through submit, queue/restore, and resume.

## Manual test

1. Build and run this branch locally:
   - `cd /Users/daniels/code/codex/codex-rs`
   - `cargo build -p codex-cli --bin codex`
   - `./target/debug/codex`
2. Open mention picker with `$` and pick a duplicate entry (not the
first one).
3. Confirm duplicate UI:
   - repo duplicate rows show `[Repo]`
   - app duplicate rows show `[App]`
   - app description does **not** start with `Connected -`
4. Submit the prompt, then press Up to restore draft and submit again.  
   Expected: it keeps the same selected duplicate target.
5. Use `/resume` to reopen the session and send again.  
Expected: restored mention still resolves to the same duplicate target.
2026-02-06 15:59:00 -08:00
alexsong-oai
daeef06bec add originator to otel (#10826) 2026-02-06 15:13:56 -08:00
Brian Yu
1fbf5ed06f Support alternative websocket API (#10861)
**Test plan**

```
cargo build -p codex-cli && RUST_LOG='codex_api::endpoint::responses_websocket=trace,codex_core::client=debug,codex_core::codex=debug' \
  ./target/debug/codex \
    --enable responses_websockets_v2 \
    --profile byok \
    --full-auto
```
2026-02-06 14:40:50 -08:00
Ahmed Ibrahim
ba8b5d9018 Treat compaction failure as failure state (#10927)
- Return compaction errors from local and remote compaction flows.\n-
Stop turns/tasks when auto-compaction fails instead of continuing
execution.
2026-02-06 13:51:46 -08:00
Charley Cunningham
143daadb31 core: refresh developer instructions after compaction replacement history (#10574)
## Summary

When replaying compacted history (especially `replacement_history` from
remote compaction), we should not keep stale developer messages from
older session state. This PR trims developer-
role messages from compacted replacement history and reinjects fresh
developer instructions derived from current turn/session state.

This aligns compaction replay behavior with the intended "fresh
instructions after summary" model.

## Problem

Compaction replay had two paths:

- `Compacted { replacement_history: None }`: rebuilt with fresh initial
context
- `Compacted { replacement_history: Some(...) }`: previously used raw
replacement history as-is

The second path could carry stale developer instructions
(permissions/personality/collab-mode guidance) across session changes.

## What Changed

### 1) Added helper to refresh compacted developer instructions

- **File:** `codex-rs/core/src/compact.rs`
- **Function:** `refresh_compacted_developer_instructions(...)`

Behavior:
- remove all `ResponseItem::Message { role: "developer", .. }` from
compacted history
- append fresh developer messages from current
`build_initial_context(...)`

### 2) Applied helper in remote compaction flow

- **File:** `codex-rs/core/src/compact_remote.rs`
- After receiving compact endpoint output, refresh developer
instructions before replacing history and persisting
`replacement_history`.

### 3) Applied helper while reconstructing history from rollout

- **File:** `codex-rs/core/src/codex.rs`
- In `reconstruct_history_from_rollout(...)`, when processing
`Compacted` entries with `replacement_history`, refresh developer
instructions instead of directly replacing with raw history.

## Non-Goals / Follow-up

This PR does **not** address the existing first-turn-after-resume
double-injection behavior.
A follow-up PR will handle resume-time dedup/idempotence separately.

If you want, I can also give you a shorter “squash-merge friendly”
version of the description.

## Codex author
`codex fork 019c25e6-706e-75d1-9198-688ec00a8256`
2026-02-06 12:25:08 -08:00
Josh McKinney
e416e578bb core: preconnect Responses websocket for first turn (#10698)
## Problem
The first user turn can pay websocket handshake latency even when a
session has already started. We want to reduce that initial delay while
preserving turn semantics and avoiding any prompt send during startup.

Reviewer feedback also called out duplicated connect/setup paths and
unnecessary preconnect state complexity.

## Mental model
`ModelClient` owns session-scoped transport state. During session
startup, it can opportunistically warm one websocket handshake slot. A
turn-scoped `ModelClientSession` adopts that slot once if available,
restores captured sticky turn-state, and otherwise opens a websocket
through the same shared connect path.

If startup preconnect is still in flight, first turn setup awaits that
task and treats it as the first connection attempt for the turn.

Preconnect is handshake-only. The first `response.create` is still sent
only when a turn starts.

## Non-goals
This change does not make preconnect required for correctness and does
not change prompt/turn payload semantics. It also does not expand
fallback behavior beyond clearing preconnect state when fallback
activates.

## Tradeoffs
The implementation prioritizes simpler ownership and shared connection
code over header-match gating for reuse. The single-slot cache keeps
lifecycle straightforward but only benefits the immediate next turn.

Awaiting in-flight preconnect has the same app-level connect-timeout
semantics as existing websocket connect behavior (no new timeout class
introduced by this PR).

## Architecture
`core/src/client.rs`:
- Added session-level preconnect lifecycle state (`Idle` / `InFlight` /
`Ready`) carrying one warmed websocket plus optional captured
turn-state.
- Added `pre_establish_connection()` startup warmup and `preconnect()`
handshake-only setup.
- Deduped auth/provider resolution into `current_client_setup()` and
websocket handshake wiring into `connect_websocket()` /
`build_websocket_headers()`.
- Updated turn websocket path to adopt preconnect first, await in-flight
preconnect when present, then create a new websocket only when needed.
- Ensured fallback activation clears warmed preconnect state.
- Added documentation for lifecycle, ownership, sticky-routing
invariants, and timeout semantics.

`core/src/codex.rs`:
- Session startup invokes `model_client.pre_establish_connection(...)`.
- Turn metadata resolution uses the shared timeout helper.

`core/src/turn_metadata.rs`:
- Centralized shared timeout helper used by both turn-time metadata
resolution and startup preconnect metadata building.

`core/tests/common/responses.rs` + websocket test suites:
- Added deterministic handshake waiting helper (`wait_for_handshakes`)
with bounded polling.
- Added startup preconnect and in-flight preconnect reuse coverage.
- Fallback expectations now assert exactly two websocket attempts in
covered scenarios (startup preconnect + turn attempt before fallback
sticks).

## Observability
Preconnect remains best-effort and non-fatal. Existing
websocket/fallback telemetry remains in place, and debug logs now make
preconnect-await behavior and preconnect failures easier to reason
about.

## Tests
Validated with:
1. `just fmt`
2. `cargo test -p codex-core websocket_preconnect -- --nocapture`
3. `cargo test -p codex-core websocket_fallback -- --nocapture`
4. `cargo test -p codex-core
websocket_first_turn_waits_for_inflight_preconnect -- --nocapture`
2026-02-06 19:08:24 +00:00
jif-oai
aab61934af Handle required MCP startup failures across components (#10902)
Summary
- add a `required` flag for MCP servers everywhere config/CLI data is
touched so mandatory helpers can be round-tripped
- have `codex exec` and `codex app-server` thread start/resume fail fast
when required MCPs fail to initialize
2026-02-06 17:14:37 +01:00
Eric Traut
dd80e332c4 Removed the "remote_compaction" feature flag (#10840)
This feature is always on now
2026-02-05 23:54:57 -08:00
Anton Panasenko
4ee039744e feat: expose detailed metrics to runtime metrics (#10699) 2026-02-05 18:22:30 -08:00
pakrym-oai
dbe47ea01a Send beta header with websocket connects (#10727) 2026-02-05 15:05:02 -08:00
sayan-oai
378f1cabe8 go back to auto-enabling web_search for azure (#10820)
###### What
Remove special-casing that prevented auto-enabling `web_search` for
Azure model provider users. Addresses #10071, #10257.

###### Why
Azure fixed their responsesapi implementation; `web_search` is now
supported on models it wasn't before (like `gpt-5.1-codex-max`).

This request now works:
```
curl "$AZURE_API_ENDPOINT" -H "Content-Type: application/json" -H "Authorization: Bearer $AZURE_API_KEY" -d '{
  "model": "gpt-5.1-codex-max",
  "tools": [
    { "type": "web_search" }
  ],
  "tool_choice": "auto",
  "input": "Find the sunrise time in Paris today and cite the source."
}'
```

###### Tests
Tested with above curl, removed Azure-specific tests.
2026-02-05 14:57:07 -08:00
jif-oai
428a9f6035 feat: wait for backfill to be ready (#10790) 2026-02-05 20:45:16 +00:00
sayan-oai
5fdf6f5efa chore: rm web-search-eligible header (#10660)
default-enablement of web_search is now client-side, no need to send
eligibility headers to backend.

Tested locally, headers no longer sent.

will wait for corresponding backend change to deploy before merging
2026-02-05 11:48:34 -08:00
Owen Lin
3582b74d01 fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
So that the rest of the codebase (like TUI) don't need to be concerned
whether ChatGPT auth was handled by Codex itself or passed in via
app-server's external auth mode.
2026-02-05 10:46:06 -08:00
jif-oai
9ee746afd6 Leverage state DB metadata for thread summaries (#10621)
Summary:
- read conversation summaries and cwd info from the state DB when
possible so we no longer rely on rollout files for metadata and avoid
extra I/O
- persist CLI version in thread metadata, surface it through summary
builders, and add the necessary DB migration hooks
- simplify thread listing by using enriched state DB data directly
rather than reading rollout heads

Testing:
- Not run (not requested)
2026-02-05 16:39:11 +00:00
jif-oai
41f3b1ba0b feat: add memory tool (#10637)
Add a tool for memory to retrieve a full memory based on the memory ID
2026-02-05 16:16:31 +00:00
jif-oai
97582ac52d Allow user shell commands to run alongside active turns (#10513)
Summary
- refactor user shell command execution into a shared helper and add
modes for standalone vs active-turn execution
- run user shell commands asynchronously when a turn is already active
so they don’t replace or abort the current turn
- extend the tests to cover the new behavior and add the generated Codex
environment manifest

Testing
- Not run (not requested)
2026-02-05 11:11:00 +00:00
Dylan Hurd
fe8b474acd fix(core,app-server) resume with different model (#10719)
## Summary
When resuming with a different model, we should also append a developer
message with the model instructions

## Testing
- [x] Added unit tests
2026-02-05 00:40:05 -08:00
Charley Cunningham
dc7007beaa Fix remote compaction estimator/payload instruction small mismatch (#10692)
## Summary
This PR fixes a deterministic mismatch in remote compaction where
pre-trim estimation and the `/v1/responses/compact` payload could use
different base instructions.

Before this change:
- pre-trim estimation used model-derived instructions
(`model_info.get_model_instructions(...)`)
- compact payload used session base instructions
(`sess.get_base_instructions()`)

After this change:
- remote pre-trim estimation and compact payload both use the same
`BaseInstructions` instance from session state.

## Changes
- Added a shared estimator entry point in `ContextManager`:
- `estimate_token_count_with_base_instructions(&self, base_instructions:
&BaseInstructions) -> Option<i64>`
- Kept `estimate_token_count(&TurnContext)` as a thin wrapper that
resolves model/personality instructions and delegates to the new helper.
- Updated remote compaction flow to fetch base instructions once and
reuse it for both:
  - trim preflight estimation
  - compact request payload construction
- Added regression coverage for parity and behavior:
  - unit test verifying explicit-base estimator behavior
- integration test proving remote compaction uses session override
instructions and trims accordingly

## Why this matters
This removes a deterministic divergence source where pre-trim could
think the request fits while the actual compact request exceeded context
because its instructions were longer/different.

## Scope
In scope:
- estimator/payload base-instructions parity in remote compaction

Out of scope:
- retry-on-`context_length_exceeded`
- compaction threshold/headroom policy changes
- broader trimming policy changes

## Codex author:
`codex fork 019c2b24-c2df-7b31-a482-fb8cf7a28559`
2026-02-04 23:24:06 -08:00
Dylan Hurd
e482978261 fix(core) switching model appends model instructions (#10651)
## Summary
When switching models, we should append the instructions of the new
model to the conversation as a developer message.

## Test
- [x] Adds a unit test
2026-02-05 05:50:38 +00:00
Dylan Hurd
a05aadfa1b chore(config) Default Personality Pragmatic (#10705)
## Summary
Switch back to Pragmatic personality

## Testing
- [x] Updated unit tests
2026-02-04 21:22:47 -08:00
Dylan Hurd
73f32840c6 chore(core) personality migration tests (#10650)
## Summary
Adds additional tests for personality edge cases

## Testing
- [x] These are tests
2026-02-04 19:03:14 -08:00
pakrym-oai
0e8d359da9 Session-level model client (#10664)
Make ModelClient a session-scoped object.
Move state that is session level onto the client, and make state that is
per-turn explicit on corresponding methods.
Stop taking a huge Config object, instead only pass in values that are
actually needed.

---------

Co-authored-by: Josh McKinney <joshka@openai.com>
2026-02-04 16:58:48 -08:00
Owen Lin
5ea107a088 feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567)
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397

Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image

So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.

## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).

Old shape:
`{ "output": "dynamic-ok", "success": true }`

New shape:
```
{
    "contentItems": [
      { "type": "inputText", "text": "dynamic-ok" },
      { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
    ]
  "success": true
}
```
2026-02-04 16:12:47 -08:00
Ahmed Ibrahim
f9c38f531c add none personality option (#10688)
- add none personality enum value and empty placeholder behavior\n- add
docs/schema updates and e2e coverage
2026-02-04 15:40:33 -08:00
Eric Traut
7bcc552325 Added support for live updates to skills (#10478)
Add a centralized FileWatcher in codex-core (using notify) that watches
skill roots from the config layer stack (recursive)

Send `SkillsChanged` events when relevant file system changes are
detected

On `SkillsChanged`:
* Invalidate the skills cache immediately in ThreadManager
* Emit EventMsg::SkillsUpdateAvailable to active sessions
~~* Broadcast a new app-server notification:
SkillsListUpdatedNotification~~

This change does not inject new items into the event stream. That means
the agent will not know about new skills, so it won't be able to
implicitly invoke new skills. It also won't know about changes to
existing skills, so if it has already read the contents of a modified
skill, it will not honor the new behavior.

This change also does not detect modifications to AGENTS.md.

I plan to address these limitations in a follow-on PR modeled after
#9985. Injection of new skills and AGENTS was deemed to risky, hence the
need to split the feature into two stages. The changes in this PR were
designed to easily accommodate the second stage once we have some other
foundational changes in place.

Testing: In addition to automated tests, I did manual testing to confirm
that newly-created skills, deleted skills, and renamed skills are
reflected in the TUI skill picker menu. Also confirmed that
modifications to behaviors for explicitly-invoked skills are honored.

---------

Co-authored-by: Xin Lin <xl@openai.com>
2026-02-04 15:25:03 -08:00
Ahmed Ibrahim
7a253076fe Persist pending input user events (#10656)
- Persist user-message events for mid-turn injected input by emitting
user message turn items when pending input is recorded.
2026-02-04 11:47:10 -08:00
viyatb-oai
ae4de43ccc feat(linux-sandbox): add bwrap support (#9938)
## Summary
This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
curent Linux sandbox path relies on in-process restrictions (including
Landlock). Bubblewrap gives us a more uniform filesystem isolation
model, especially explicit writable roots with the option to make some
directories read-only and granular network controls.

This is behind a feature flag so we can validate behavior safely before
making it the default.

- Added temporary rollout flag:
  - `features.use_linux_sandbox_bwrap`
- Preserved existing default path when the flag is off.
- In Bubblewrap mode:
- Added internal retry without /proc when /proc mount is not permitted
by the host/container.
2026-02-04 11:13:17 -08:00
jif-oai
71e63f8d10 fix: flaky test (#10644) 2026-02-04 17:59:22 +00:00
jif-oai
49dd67a260 feat: land unified_exec (#10641)
Land `unified_exec` for all non-windows OS
2026-02-04 16:39:41 +00:00
pakrym-oai
0efd33f7f4 Update tests to stop using sse_completed fixture (#10638)
Summary:
- replace the `sse_completed` fixture and related JSON template with
direct `responses::ev_completed` payload builders
- cascade the new SSE helpers through all affected core tests for
consistency and clarity
- remove legacy fixtures that were no longer needed once the helpers are
in place

Testing:
- Not run (not requested)
2026-02-04 08:38:06 -08:00
jif-oai
583e5d4f41 Migrate state DB path helpers to versioned filename (#10623)
Summary
- add versioned state sqlite filename helpers and re-export them from
the state crate
- remove legacy state files when initializing the runtime and update
consumers/tests to use the new helpers
- tweak logs client description and database resolution to match the new
path
2026-02-04 14:31:12 +00:00
Rasmus Rygaard
df000da917 Add a codex.rate_limits event for websockets (#10324)
When communicating over websockets, we can't rely on headers to deliver
rate limit information. This PR adds a `codex.rate_limits` event that
the server can pass to the client to inform them about rate limit usage.
The client parses this data the same way we parse rate limit headers in
HTTP mode.

This PR also wires up the etag and reasoning headers for websockets
2026-02-04 06:01:47 -08:00
jif-oai
61aecdde66 fix: make sure file exist in find_thread_path_by_id_str_in_subdir (#10618) 2026-02-04 13:01:17 +00:00
Anton Panasenko
fcaed4cb88 feat: log webscocket timing into runtime metrics (#10577) 2026-02-03 18:04:07 -08:00
viyatb-oai
08926a3fb7 chore(arg0): advisory-lock janitor for codex tmp paths (#10039)
## Description

### What changed
- Switch the arg0 helper root from `~/.codex/tmp/path` to
`~/.codex/tmp/path2`
- Add `Arg0PathEntryGuard` to keep both the `TempDir` and an exclusive
`.lock` file alive for the process lifetime
- Add a startup janitor that scans `path2` and deletes only directories
whose lock can be acquired

### Tests
- `cargo clippy -p codex-arg0`
- `cargo clippy -p codex-core`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core`
2026-02-03 21:38:31 +00:00
Charley Cunningham
998eb8f32b Improve Default mode prompt (less confusion with Plan mode) (#10545)
## Summary

This PR updates `request_user_input` behavior and Default-mode guidance
to match current collaboration-mode semantics and reduce model
confusion.

## Why

- `request_user_input` should be explicitly documented as **Plan-only**.
- Tool description and runtime availability checks should be driven by
the **same centralized mode policy**.
- Default mode prompt needed stronger execution guidance and explicit
instruction that `request_user_input` is unavailable.
- Error messages should report the **actual mode name** (not aliases
that can read as misleading).

## What changed

- Centralized `request_user_input` mode policy in `core` handler logic:
  - Added a single allowed-modes config (`Plan` only).
  - Reused that policy for:
    - runtime rejection messaging
    - tool description text
- Updated tool description to include availability constraint:
  - `"This tool is only available in Plan mode."`
- Updated runtime rejection behavior:
  - `Default` -> `"request_user_input is unavailable in Default mode"`
  - `Execute` -> `"request_user_input is unavailable in Execute mode"`
- `PairProgramming` -> `"request_user_input is unavailable in Pair
Programming mode"`
- Strengthened Default collaboration prompt:
  - Added explicit execution-first behavior
  - Added assumptions-first guidance
  - Added explicit `request_user_input` unavailability instruction
  - Added concise progress-reporting expectations
- Simplified formatting implementation:
  - Inlined allowed-mode name collection into `format_allowed_modes()`
- Kept `format_allowed_modes()` output for 3+ modes as CSV style
(`modes: a,b,c`)
2026-02-03 12:08:38 -08:00
jif-oai
c38a5958d7 feat: find_thread_path_by_id_str_in_subdir from DB (#10532) 2026-02-03 19:09:04 +00:00
jif-oai
33dc93e4d2 Enable parallel shell tools (#10505)
Summary
- mark the shell-related tools as supporting parallel tool calls so
exec_command, shell_command, etc. can run concurrently
- update expectations in tool parallelism tests to reflect the new
parallel behavior
- drop the unused serial duration helper from the suite

Testing
- Not run (not requested)
2026-02-03 18:05:02 +00:00
Charley Cunningham
d509df676b Cleanup collaboration mode variants (#10404)
## Summary

This PR simplifies collaboration modes to the visible set `default |
plan`, while preserving backward compatibility for older partners that
may still send legacy mode
names.

Specifically:
- Renames the old Code behavior to **Default**.
- Keeps **Plan** as-is.
- Removes **Custom** mode behavior (fallbacks now resolve to Default).
- Keeps `PairProgramming` and `Execute` internally for compatibility
plumbing, while removing them from schema/API and UI visibility.
- Adds legacy input aliasing so older clients can still send old mode
names.

## What Changed

1. Mode enum and compatibility
- `ModeKind` now uses `Plan` + `Default` as active/public modes.
- `ModeKind::Default` deserialization accepts legacy values:
  - `code`
  - `pair_programming`
  - `execute`
  - `custom`
- `PairProgramming` and `Execute` variants remain in code but are hidden
from protocol/schema generation.
- `Custom` variant is removed; previous custom fallbacks now map to
`Default`.

2. Collaboration presets and templates
- Built-in presets now return only:
  - `Plan`
  - `Default`
- Template rename:
  - `core/templates/collaboration_mode/code.md` -> `default.md`
- `execute.md` and `pair_programming.md` remain on disk but are not
surfaced in visible preset lists.

3. TUI updates
- Updated user-facing naming and prompts from “Code” to “Default”.
- Updated mode-cycle and indicator behavior to reflect only visible
`Plan` and `Default`.
- Updated corresponding tests and snapshots.

4. request_user_input behavior
- `request_user_input` remains allowed only in `Plan` mode.
- Rejection messaging now consistently treats non-plan modes as
`Default`.

5. Schemas
- Regenerated config and app-server schemas.
- Public schema types now advertise mode values as:
  - `plan`
  - `default`

## Backward Compatibility Notes

- Incoming legacy mode names (`code`, `pair_programming`, `execute`,
`custom`) are accepted and coerced to `default`.
- Outgoing/public schema surfaces intentionally expose only `plan |
default`.
- This allows tolerant ingestion of older partner payloads while
standardizing new integrations on the reduced mode set.

## Codex author
`codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
2026-02-03 09:23:53 -08:00
jif-oai
d2394a2494 chore: nuke chat/completions API (#10157) 2026-02-03 11:31:57 +00:00
pakrym-oai
53d8474061 Ignore remote_compact_trims_function_call_history_to_fit_context_window on windows (#10474) 2026-02-02 22:47:38 -08:00
sayan-oai
59707da857 fix: clarify deprecation message for features.web_search (#10406)
clarify that the new `web_search` is not a feature flag under
`[features]` in the deprecation CTA
2026-02-02 21:17:01 -08:00
pakrym-oai
cbfd2a37cc Trim compaction input (#10374)
Two fixes:

1. Include trailing tool output in the total context size calculation.
Otherwise when checking whether compaction should run we ignore newly
added outputs.
2. Trim trailing tool output/tool calls until we can fit the request
into the model context size. Otherwise the compaction endpoint will fail
to compact. We only trim items that can be reproduced again by the model
(tool calls, tool call outputs).
2026-02-02 19:03:11 -08:00