Commit Graph

88 Commits

Author SHA1 Message Date
Anton Panasenko
becc3a0424 feat: search_tool (#10657)
**Why We Did This**
- The goal is to reduce MCP tool context pollution by not exposing the
full MCP tool list up front
- It forces an explicit discovery step (`search_tool_bm25`) so the model
narrows tool scope before making MCP calls, which helps relevance and
lowers prompt/tool clutter.

**What It Changed**
- Added a new experimental feature flag `search_tool` in
`core/src/features.rs:90` and `core/src/features.rs:430`.
- Added config/schema support for that flag in
`core/config.schema.json:214` and `core/config.schema.json:1235`.
- Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
`core/Cargo.toml:23`.
- Added new tool handler `search_tool_bm25` in
`core/src/tools/handlers/search_tool_bm25.rs:18`.
- Registered the handler and tool spec in
`core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
`core/src/tools/spec.rs:1344`.
- Extended `ToolsConfig` to carry `search_tool` enablement in
`core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
- Injected dedicated developer instructions for tool-discovery workflow
in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
`core/templates/search_tool/developer_instructions.md:1`.
- Added session state to store one-shot selected MCP tools in
`core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
- Added filtering so when feature is enabled, only selected MCP tools
are exposed on the next request (then consumed) in
`core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
- Added E2E suite coverage for
enablement/instructions/hide-until-search/one-turn-selection in
`core/tests/suite/search_tool.rs:72`,
`core/tests/suite/search_tool.rs:109`,
`core/tests/suite/search_tool.rs:147`, and
`core/tests/suite/search_tool.rs:218`.
- Refactored test helper utilities to support config-driven tool
collection in `core/tests/suite/tools.rs:281`.

**Net Behavioral Effect**
- With `search_tool` **off**: existing MCP behavior (tools exposed
normally).
- With `search_tool` **on**: MCP tools start hidden, model must call
`search_tool_bm25`, and only returned `selected_tools` are available for
the next model call.
2026-02-09 12:53:50 -08:00
pakrym-oai
ccd17374cb Move warmup to the task level (#11216)
Instead of storing a special connection on the client level make the
regular task responsible for establishing a normal client session and
open a connection on it.

Then when the turn is started we pass in a pre-established session.
2026-02-09 10:57:52 -08:00
jif-oai
6cf61725d0 feat: do not close unified exec processes across turns (#10799)
With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`

I made sure that `codex exec` correctly kill all the processes
2026-02-09 10:27:46 +00:00
Michael Bolin
383b45279e feat: include NetworkConfig through ExecParams (#11105)
This PR adds the following field to `Config`:

```rust
pub network: Option<NetworkProxy>,
```

Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).

This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
2026-02-09 03:32:17 +00:00
Eric Traut
b3de6c7f2b Defer persistence of rollout file (#11028)
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes

For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests

Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
2026-02-07 23:05:03 -08:00
Michael Bolin
a118494323 feat: add support for allowed_web_search_modes in requirements.toml (#10964)
This PR makes it possible to disable live web search via an enterprise
config even if the user is running in `--yolo` mode (though cached web
search will still be available). To do this, create
`/etc/codex/requirements.toml` as follows:

```toml
# "live" is not allowed; "disabled" is allowed even though not listed explicitly.
allowed_web_search_modes = ["cached"]
```

Or set `requirements_toml_base64` MDM as explained on
https://developers.openai.com/codex/security/#locations.

### Why
- Enforce admin/MDM/`requirements.toml` constraints on web-search
behavior, independent of user config and per-turn sandbox defaults.
- Ensure per-turn config resolution and review-mode overrides never
crash when constraints are present.

### What
- Add `allowed_web_search_modes` to requirements parsing and surface it
in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
fixtures updated.
- Define a requirements allowlist type (`WebSearchModeRequirement`) and
normalize semantics:
  - `disabled` is always implicitly allowed (even if not listed).
  - An empty list is treated as `["disabled"]`.
- Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
requirements via `ConstrainedWithSource<WebSearchMode>`.
- Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
- Prefer `Live → Cached → Disabled` when
`SandboxPolicy::DangerFullAccess` is active (subject to requirements),
unless the user preference is explicitly `Disabled`.
- Otherwise, honor the user’s preferred mode, falling back to an allowed
mode when necessary.
- Update TUI `/debug-config` and app-server mapping to display
normalized `allowed_web_search_modes` (including implicit `disabled`).
- Fix web-search integration tests to assert cached behavior under
`SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
`live` when allowed).
2026-02-07 05:55:15 +00:00
Ahmed Ibrahim
ba8b5d9018 Treat compaction failure as failure state (#10927)
- Return compaction errors from local and remote compaction flows.\n-
Stop turns/tasks when auto-compaction fails instead of continuing
execution.
2026-02-06 13:51:46 -08:00
Eric Traut
dd80e332c4 Removed the "remote_compaction" feature flag (#10840)
This feature is always on now
2026-02-05 23:54:57 -08:00
jif-oai
97582ac52d Allow user shell commands to run alongside active turns (#10513)
Summary
- refactor user shell command execution into a shared helper and add
modes for standalone vs active-turn execution
- run user shell commands asynchronously when a turn is already active
so they don’t replace or abort the current turn
- extend the tests to cover the new behavior and add the generated Codex
environment manifest

Testing
- Not run (not requested)
2026-02-05 11:11:00 +00:00
pakrym-oai
7f20357611 Stop client from being state carrier (#10595)
I'd like to make client session wide. This requires shedding all random
state it has to carry.
2026-02-04 09:05:37 -08:00
Max Johnson
66b196a725 Inject CODEX_THREAD_ID into the terminal environment (#10096)
Inject CODEX_THREAD_ID (when applicable) into the terminal environment
so that the agent (and skills) can refer to the current thread / session
ID.

Discussion:
https://openai.slack.com/archives/C095U48JNL9/p1769542492067109
2026-02-03 11:31:12 -08:00
sayan-oai
fc05374344 chore: add phase to message responseitem (#10455)
### What

add wiring for `phase` field on `ResponseItem::Message` to lay
groundwork for differentiating model preambles and final messages.
currently optional.

follows pattern in #9698.

updated schemas with `just write-app-server-schema` so we can see type
changes.

### Tests
Updated existing tests for SSE parsing and hydrating from history
2026-02-03 02:52:26 +00:00
pakrym-oai
03fcd12e77 Do not append items on override turn context (#10354) 2026-02-01 18:51:26 -08:00
Charley Cunningham
ec4a2d07e4 Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)
## Summary
- Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
tags from normal assistant output.
- Persist plan items and rebuild them on resume so proposed plans show
in thread history.
- Wire plan items/deltas through app-server protocol v2 and render a
dedicated proposed-plan view in the TUI, including the “Implement this
plan?” prompt only when a plan item is present.

## Changes

### Core (`codex-rs/core`)
- Added a generic, line-based tag parser that buffers each line until it
can disprove a tag prefix; implements auto-close on `finish()` for
unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
- Refactored proposed plan parsing to wrap the generic parser.
`codex-rs/core/src/proposed_plan_parser.rs`
- In plan mode, stream assistant deltas as:
  - **Normal text** → `AgentMessageContentDelta`
  - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
  (`codex-rs/core/src/codex.rs`)
- Final plan item content is derived from the completed assistant
message (authoritative), not necessarily the concatenated deltas.
- Strips `<proposed_plan>` blocks from assistant text in plan mode so
tags don’t appear in normal messages.
(`codex-rs/core/src/stream_events_utils.rs`)
- Persist `ItemCompleted` events only for plan items for rollout replay.
(`codex-rs/core/src/rollout/policy.rs`)
- Guard `update_plan` tool in Plan Mode with a clear error message.
(`codex-rs/core/src/tools/handlers/plan.rs`)
- Updated Plan Mode prompt to:  
  - keep `<proposed_plan>` out of non-final reasoning/preambles  
  - require exact tag formatting  
  - allow only one `<proposed_plan>` block per turn  
  (`codex-rs/core/templates/collaboration_mode/plan.md`)

### Protocol / App-server protocol
- Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
(`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
- Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
EXPERIMENTAL markers and note that deltas may not match the final plan
item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
- Added plan delta route in app-server protocol common mapping.
(`codex-rs/app-server-protocol/src/protocol/common.rs`)
- Rebuild plan items from persisted `ItemCompleted` events on resume.
(`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)

### App-server
- Forward plan deltas to v2 clients and map core plan items to v2 plan
items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
`codex-rs/app-server/src/codex_message_processor.rs`)
- Added v2 plan item tests.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)

### TUI
- Added a dedicated proposed plan history cell with special background
and padding, and moved “• Proposed Plan” outside the highlighted block.
(`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
- Only show “Implement this plan?” when a plan item exists.
(`codex-rs/tui/src/chatwidget.rs`,
`codex-rs/tui/src/chatwidget/tests.rs`)

<img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
/>

### Docs / Misc
- Updated protocol docs to mention plan deltas.
(`codex-rs/docs/protocol_v1.md`)
- Minor plumbing updates in exec/debug clients to tolerate plan deltas.
(`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)

## Tests
- Added core integration tests:
  - Plan mode strips plan from agent messages.
  - Missing `</proposed_plan>` closes at end-of-message.  
  (`codex-rs/core/tests/suite/items.rs`)
- Added unit tests for generic tag parser (prefix buffering, non-tag
lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
- Existing app-server plan item tests in v2.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)

## Notes / Behavior
- Plan output no longer appears in standard assistant text in Plan Mode;
it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
- The final plan item content is authoritative and may diverge from
streamed deltas (documented as experimental).
- Reasoning summaries are not filtered; prompt instructs the model not
to include `<proposed_plan>` outside the final plan message.

## Codex Author
`codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
2026-01-30 18:59:30 +00:00
jif-oai
798c4b3260 feat: reduce span exposition (#10171)
This only avoids the creation of duplicates spans
2026-01-29 18:15:22 +00:00
jif-oai
89c5f3c4d4 feat: adding thread ID to logs + filter in the client (#10150) 2026-01-29 16:53:30 +01:00
sayan-oai
a90ab789c2 fix: enable per-turn updates to web search mode (#10040)
web_search can now be updated per-turn, for things like changes to
sandbox policy.

`SandboxPolicy::DangerFullAccess` now sets web_search to `live`, and the
default is still `cached`.

Added integration tests.
2026-01-27 18:09:29 -08:00
iceweasel-oai
c40ad65bd8 remove sandbox globals. (#9797)
Threads sandbox updates through OverrideTurnContext for active turn
Passes computed sandbox type into safety/exec
2026-01-27 11:04:23 -08:00
Owen Lin
fc0fd85349 fix(app-server, core): defer initial context write to rollout file until first turn (#9950)
### Overview
Currently calling `thread/resume` will always bump the thread's
`updated_at` timestamp. This PR makes it the `updated_at` timestamp
changes only if a turn is triggered.

### Additonal context
What we typically do on resuming a thread is **always** writing “initial
context” to the rollout file immediately. This initial context includes:
- Developer instructions derived from sandbox/approval policy + cwd
- Optional developer instructions (if provided)
- Optional collaboration-mode instructions
- Optional user instructions (if provided)
- Environment context (cwd, shell, etc.)

This PR defers writing the “initial context” to the rollout file until
the first `turn/start`, so we don't inadvertently bump the thread's
`updated_at` timestamp until a turn is actually triggered.

This works even though both `thread/resume` and `turn/start` accept
overrides (such as `model`, `cwd`, etc.) because the initial context is
seeded from the effective `TurnContext` in memory, computed at
`turn/start` time, after both sets of overrides have been applied.

**NOTE**: This is a very short-lived solution until we introduce sqlite.
Then we can remove this.
2026-01-27 10:41:54 -08:00
sayan-oai
0adcd8aa86 make cached web_search client-side default (#9974)
[Experiment](https://console.statsig.com/50aWbk2p4R76rNX9lN5VUw/experiments/codex_web_search_rollout/summary)
for default cached `web_search` completed; cached chosen as default.

Update client to reflect that.
2026-01-26 21:25:40 -08:00
jif-oai
09251387e0 chore: update interrupt message (#9925) 2026-01-26 19:07:54 +00:00
pakrym-oai
b511c38ddb Support end_turn flag (#9698)
Experimental flag that signals the end of the turn.
2026-01-22 17:27:48 +00:00
jif-oai
a22a61e678 feat: display raw command on user shell (#9598) 2026-01-21 09:44:38 +00:00
Skylar Graika
b236f1c95d fix: prevent repeating interrupted turns (#9043)
## What
Record a model-visible `<turn_aborted>` marker in history when a turn is
interrupted, and treat it as a session prefix.

## Why
When a turn is interrupted, Codex emits `TurnAborted` but previously did
not persist anything model-visible in the conversation history. On the
next user turn, the model can’t tell the previous work was aborted and
may resume/repeat earlier actions (including duplicated side effects
like re-opening PRs).

Fixes: https://github.com/openai/codex/issues/9042

## How
On `TurnAbortReason::Interrupted`, append a hidden user message
containing a `<turn_aborted>…</turn_aborted>` marker and flush.
Treat `<turn_aborted>` like `<environment_context>` for session-prefix
filtering.
Add a regression test to ensure follow-up turns don’t repeat side
effects from an aborted turn.

## Testing
`just fmt`
`just fix -p codex-core`
`cargo test -p codex-core -- --test-threads=1`
`cargo test --all-features -- --test-threads=1`

---------

Co-authored-by: Skylar Graika <sgraika127@gmail.com>
Co-authored-by: jif-oai <jif@openai.com>
Co-authored-by: Eric Traut <etraut@openai.com>
2026-01-20 13:07:28 -08:00
Ahmed Ibrahim
b11e96fb04 Act on reasoning-included per turn (#9402)
- Reset reasoning-included flag each turn and update compaction test
2026-01-19 11:23:25 -08:00
Eric Traut
264d40efdc Fix invalid input error on Azure endpoint (#9387)
Users of Azure endpoints are reporting that when they use `/review`,
they sometimes see an error "Invalid 'input[3].id". I suspect this is
specific to the Azure implementation of the `responses` API. The Azure
team generally copies the OpenAI code for this endpoint, but they do
have minor differences and sometimes lag in rolling out bug fixes or
updates.

The error appears to be triggered because the `/review` implementation
is using a user ID with a colon in it.

Addresses #9360
2026-01-19 08:21:27 -08:00
jif-oai
7ebe13f692 feat: timer total turn metrics (#9382) 2026-01-19 10:44:31 +00:00
jif-oai
c1ac5223e1 feat: run user commands under user snapshot (#9357)
The initial goal is for user snapshots to have access to aliases etc
2026-01-16 11:49:28 +01:00
sayan-oai
169201b1b5 [search] allow explicitly disabling web search (#9249)
moving `web_search` rollout serverside, so need a way to explicitly
disable search + signal eligibility from the client.

- Add `x‑oai‑web‑search‑eligible` header that signifies whether the
request can have web search.
- Only attach the `web_search` tool when the resolved `WebSearchMode` is
`Live` or `Cached`.
2026-01-15 11:28:57 -08:00
sayan-oai
5e426ac270 add WebSearchMode enum (#9216)
### What
Add `WebSearchMode` enum (disabled, cached live, defaults to cached) to
config + V2 protocol. This enum takes precedence over legacy flags:
`web_search_cached`, `web_search_request`, and `tools.web_search`.

Keep `--search` as live.

### Tests
Added tests
2026-01-14 12:51:42 -08:00
pakrym-oai
92472e7baa Use current model for review (#9179)
Instead of having a hard-coded default review model, use the current
model for running `/review` unless one is specified in the config.

Also inherit current reasoning effort
2026-01-14 08:59:41 -08:00
Anton Panasenko
51d75bb80a fix: drop session span at end of the session (#9126) 2026-01-13 11:36:00 -08:00
jif-oai
1aed01e99f renaming: task to turn (#8963) 2026-01-09 17:31:17 +00:00
jif-oai
69898e3dba clean: all history cloning (#8916) 2026-01-08 18:17:18 +00:00
jif-oai
5522663f92 feat: add a few metrics (#8910) 2026-01-08 15:39:57 +00:00
Thibault Sottiaux
be212db0c8 fix: include project instructions in /review subagent (#8899)
Include project-level AGENTS.md and skills in /review sessions so the
review sub-agent uses the same instruction pipeline as standard runs,
keeping reviewer context aligned with normal sessions.
2026-01-08 13:31:01 +00:00
jif-oai
634650dd25 feat: metrics capabilities (#8318)
Add metrics capabilities to Codex. The `README.md` is up to date.

This will not be merged with the metrics before this PR of course:
https://github.com/openai/codex/pull/8350
2026-01-08 11:47:36 +00:00
jif-oai
1253d19641 chore: drop useless feature flags (#8850) 2026-01-07 19:54:32 +00:00
jif-oai
116059c3a0 chore: unify conversation with thread name (#8830)
Done and verified by Codex + refactor feature of RustRover
2026-01-07 17:04:53 +00:00
jif-oai
4cef89a122 chore: rename unified exec sessions (#8822)
Renaming done by Codex
2026-01-07 16:12:47 +00:00
Ahmed Ibrahim
f0dc6fd3c7 Rename OpenAI models to models manager (#8346)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2025-12-19 16:20:05 -08:00
jif-oai
3d92b443b0 feat: add config to disable warnings around ghost snapshot (#8178) 2025-12-17 18:50:22 +00:00
Eric Traut
bbc5675974 Revert "chore: review in read-only (#7593)" (#8127)
This reverts commit 291b54a762.

This commit was intended to prevent the model from making code changes
during `/review`, which is sometimes does. Unfortunately, it has other
unintended side effects that cause `/review` to fail in a variety of
ways. See #8115 and #7815. We've therefore decided to revert this
change.
2025-12-16 12:01:54 -08:00
jif-oai
d7482510b1 nit: trace span for regular task (#8053)
Logs are too spammy

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-12-16 16:53:15 +00:00
jif-oai
ae57e18947 feat: close unified_exec at end of turn (#8052) 2025-12-16 12:16:43 +00:00
jif-oai
0d9801d448 feat: ghost snapshot v2 (#8055)
This PR updates ghost snapshotting to avoid capturing oversized
untracked artifacts while keeping undo safe. Snapshot creation now
builds a temporary index from `git status --porcelain=2 -z`, writes a
tree and detached commit without touching refs, and records any ignored
large files/dirs in the snapshot report. Undo uses that metadata to
preserve large local artifacts while still cleaning up new transient
files.
2025-12-15 11:14:36 +01:00
jif-oai
4274e6189a feat: config ghost commits (#7873) 2025-12-15 09:13:06 +01:00
Anton Panasenko
ad7b9d63c3 [codex] add otel tracing (#7844) 2025-12-12 17:07:17 -08:00
pakrym-oai
b3ddd50eee Remote compact for API-key users (#7835) 2025-12-12 10:05:02 -08:00
jif-oai
b2280d6205 feat: warning for long snapshots (#7870) 2025-12-11 12:42:47 +00:00