Commit Graph

679 Commits

Author SHA1 Message Date
jif-oai
f6c06108b1 try fix 2 (#12264) 2026-02-19 19:36:42 +00:00
jif-oai
928be5f515 Revert "feat: no timeout mode on ue" (#12256)
Reverts openai/codex#12250
2026-02-19 19:02:29 +00:00
jif-oai
9719dc502c feat: no timeout mode on ue (#12250) 2026-02-19 18:58:13 +00:00
sayan-oai
d54999d006 client side modelinfo overrides (#12101)
TL;DR
Add top-level `model_catalog_json` config support so users can supply a
local model catalog override from a JSON file path (including adding new
models) without backend changes.

### Problem
Codex previously had no clean client-side way to replace/overlay model
catalog data for local testing of model metadata and new model entries.

### Fix
- Add top-level `model_catalog_json` config field (JSON file path).
- Apply catalog entries when resolving `ModelInfo`:
  1. Base resolved model metadata (remote/fallback)
  2. Catalog overlay from `model_catalog_json`
3. Existing global top-level overrides (`model_context_window`,
`model_supports_reasoning_summaries`, etc.)

### Note
Will revisit per-field overrides in a follow-up

### Tests
Added tests
2026-02-19 10:38:57 -08:00
jif-oai
2daa3fd44f feat: sub-agent injection (#12152)
This PR adds parent-thread sub-agent completion notifications and change
the prompt of the model to prevent if from being confused
2026-02-19 11:32:10 +00:00
Eric Traut
227352257c Update docs links for feature flag notice (#12164)
Summary
- replace the stale `docs/config.md#feature-flags` reference in the
legacy feature notice with the canonical published URL
- align the deprecation notice test to expect the new link

This addresses #12123
2026-02-19 00:00:44 -08:00
Eric Traut
999576f7b8 Fixed a hole in token refresh logic for app server (#11802)
We've continued to receive reports from users that they're seeing the
error message "Your access token could not be refreshed because your
refresh token was already used. Please log out and sign in again." This
PR fixes two holes in the token refresh logic that lead to this
condition.

Background: A previous change in token refresh introduced the
`UnauthorizedRecovery` object. It implements a state machine in the core
agent loop that first performs a load of the on-disk auth information
guarded by a check for matching account ID. If it finds that the on-disk
version has been updated by another instance of codex, it uses the
reloaded auth tokens. If the on-disk version hasn't been updated, it
issues a refresh request from the token authority.

There are two problems that this PR addresses:

Problem 1: We weren't doing the same thing for the code path used by the
app server interface. This PR effectively replicates the
`UnauthorizedRecovery` logic for that code path.

Problem 2: The `UnauthorizedRecovery` logic contained a hole in the
`ReloadOutcome::Skipped` case. Here's the scenario. A user starts two
instances of the CLI. Instance 1 is active (working on a task), instance
2 is idle. Both instances have the same in-memory cached tokens. The
user then runs `codex logout` or `codex login` to log in to a separate
account, which overwrites the `auth.json` file. Instance 1 receives a
401 and refreshes its token, but it doesn't write the new token to the
`auth.json` file because the account ID doesn't match. Instance 2 is
later activated and presented with a new task. It immediately hits a 401
and attempts to refresh its token but fails because its cached refresh
token is now invalid. To avoid this situation, I've changed the logic to
immediately fail a token refresh if the user has since logged out or
logged in to another account. This will still be seen as an error by the
user, but the cause will be clearer.

I also took this opportunity to clean up the names of existing functions
to make their roles clearer.
* `try_refresh_token` is renamed `request_chatgpt_token_refresh`
* the existing `refresh_token` is renamed `refresh_token_from_authority`
(there's a new higher-level function named `refresh_token` now)
* `refresh_tokens` is renamed `refresh_and_persist_chatgpt_token`, and
it now implicitly reloads
* `update_tokens` is renamed `persist_tokens`
2026-02-18 09:27:04 -08:00
Charley Cunningham
c16f9daaaf Add model-visible context layout snapshot tests (#12073)
## Summary
- add a dedicated `core/tests/suite/model_visible_layout.rs` snapshot
suite to materialize model-visible request layout in high-value
scenarios
- add three reviewer-focused snapshot scenarios:
  - turn-level context updates (cwd / permissions / personality)
  - first post-resume turn with model hydration + personality change
- first post-resume turn where pre-turn model override matches rollout
model
- wire the new suite into `core/tests/suite/mod.rs`
- commit generated `insta` snapshots under `core/tests/suite/snapshots/`

## Why
This creates a stable, reviewable baseline of model-visible context
layout against `main` before follow-on context-management refactors. It
lets subsequent PRs show focused snapshot diffs for behavior changes
instead of introducing the test surface and behavior changes at once.

## Testing
- `just fmt`
- `INSTA_UPDATE=always cargo test -p codex-core model_visible_layout`
2026-02-17 22:30:29 -08:00
pakrym-oai
fc810ba045 Use V2 websockets if feature enabled (#12071) 2026-02-17 18:32:16 -08:00
Charley Cunningham
eb68767f2f Unify remote compaction snapshot mocks around default endpoint behavior (#12050)
## Summary
- standardize remote compaction test mocking around one default behavior
in shared helpers
- make default remote compact mocks mirror production shape: keep
`message/user` + `message/developer`, drop assistant/tool artifacts,
then append a summary user message
- switch non-special `compact_remote` tests to the shared default mock
instead of ad-hoc JSON payloads

## Special-case tests that still use explicit mocks
- remote compaction error payload / HTTP failure behavior
- summary-only compact output behavior
- manual `/compact` with no prior user messages
- stale developer-instruction injection coverage

## Why
This removes inconsistent manual remote compaction fixtures and gives us
one source of truth for normal remote compact behavior, while preserving
explicit mocks only where tests intentionally cover non-default
behavior.
2026-02-17 18:18:47 -08:00
Owen Lin
db4d2599b5 feat(core): plumb distinct approval ids for command approvals (#12051)
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051 👈 
- https://github.com/openai/codex/pull/12052

With upcoming support for a fork of zsh that allows us to intercept
`execve` and run execpolicy checks for each subcommand as part of a
`CommandExecution`, it will be possible for there to be multiple
approval requests for a shell command like `/path/to/zsh -lc 'git status
&& rg \"TODO\" src && make test'`.

To support that, this PR introduces a new `approval_id` field across
core, protocol, and app-server so that we can associate approvals
properly for subcommands.
2026-02-18 01:55:57 +00:00
Shijie Rao
b3a8571219 Chore: remove response model check and rely on header model for downgrade (#12061)
### Summary
Ensure that we use the model value from the response header only so that
we are guaranteed with the correct slug name. We are no longer checking
against the model value from response so that we are less likely to have
false positive.

There are two different treatments - for SSE we use the header from the
response and for websocket we check top-level events.
2026-02-18 01:50:06 +00:00
sayan-oai
41800fc876 chore: rm remote models fflag (#11699)
rm `remote_models` feature flag.

We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.

Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
2026-02-17 11:43:16 -08:00
Shijie Rao
48018e9eac Feat: add model reroute notification (#12001)
### Summary
Builiding off
5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0),
we have created a `model/rerouted` notification that captures the event
so that consumers can render as expected. Keep the `EventMsg::Warning`
path in core so that this does not affect TUI rendering.

`model/rerouted` is meant to be generic to account for future usage
including capacity planning etc.
2026-02-17 11:02:23 -08:00
sayan-oai
a1b8e34938 chore: clarify web_search deprecation notices and consolidate tests (#11224)
follow up to #10406, clarify default-enablement of web_search.

also consolidate pseudo-redundant tests

Tests pass
2026-02-17 18:20:24 +00:00
Dylan Hurd
0fbe10a807 fix(core) exec_policy parsing fixes (#11951)
## Summary
Fixes a few things in our exec_policy handling of prefix_rules:
1. Correctly match redirects specifically for exec_policy parsing. i.e.
if you have `prefix_rule(["echo"], decision="allow")` then `echo hello >
output.txt` should match - this should fix #10321
2. If there already exists any rule that would match our prefix rule
(not just a prompt), then drop it, since it won't do anything.


## Testing
- [x] Updated unit tests, added approvals ScenarioSpecs
2026-02-16 23:11:59 -08:00
Fouad Matin
02e9006547 add(core): safety check downgrade warning (#11964)
Add per-turn notice when a request is downgraded to a fallback model due
to cyber safety checks.

**Changes**

- codex-api: Emit a ServerModel event based on the openai-model response
header and/or response payload (SSE + WebSocket), including when the
model changes mid-stream.
- core: When the server-reported model differs from the requested model,
emit a single per-turn warning explaining the reroute to gpt-5.2 and
directing users to Trusted
    Access verification and the cyber safety explainer.
- app-server (v2): Surface these cyber model-routing warnings as
synthetic userMessage items with text prefixed by Warning: (and document
this behavior).
2026-02-16 22:13:36 -08:00
Dylan Hurd
19afbc35c1 chore(core) rm Feature::RequestRule (#11866)
## Summary
This feature is now reasonably stable, let's remove it so we can
simplify our upcoming iterations here.

## Testing 
- [x] Existing tests pass
2026-02-16 22:30:23 +00:00
jif-oai
af434b4f71 feat: drop MCP managing tools if no MCP servers (#11900)
Drop MCP tools if no MCP servers to save context

For this https://github.com/openai/codex/issues/11049
2026-02-16 18:40:45 +00:00
jif-oai
825a4af42f feat: use shell policy in shell snapshot (#11759)
Honor `shell_environment_policy.set` even after a shell snapshot
2026-02-16 09:11:00 +00:00
Anton Panasenko
1d95656149 bazel: fix snapshot parity for tests/*.rs rust_test targets (#11893)
## Summary
- make `rust_test` targets generated from `tests/*.rs` use Cargo-style
crate names (file stem) so snapshot names match Cargo (`all__...`
instead of Bazel-derived names)
- split lib vs `tests/*.rs` test env wiring in `codex_rust_crate` to
keep existing lib snapshot behavior while applying Bazel
runfiles-compatible workspace root for `tests/*.rs`
- compute the `tests/*.rs` snapshot workspace root from package depth so
`insta` resolves committed snapshots under Bazel `--noenable_runfiles`

## Validation
- `bazelisk test //codex-rs/core:core-all-test
--test_arg=suite::compact:: --cache_test_results=no`
- `bazelisk test //codex-rs/core:core-all-test
--test_arg=suite::compact_remote:: --cache_test_results=no`
2026-02-16 07:11:59 +00:00
Anton Panasenko
02abd9a8ea feat: persist and restore codex app's tools after search (#11780)
### What changed
1. Removed per-turn MCP selection reset in `core/src/tasks/mod.rs`.
2. Added `SessionState::set_mcp_tool_selection(Vec<String>)` in
`core/src/state/session.rs` for authoritative restore behavior (deduped,
order-preserving, empty clears).
3. Added rollout parsing in `core/src/codex.rs` to recover
`active_selected_tools` from prior `search_tool_bm25` outputs:
   - tracks matching `call_id`s
   - parses function output text JSON
   - extracts `active_selected_tools`
   - latest valid payload wins
   - malformed/non-matching payloads are ignored
4. Applied restore logic to resumed and forked startup paths in
`core/src/codex.rs`.
5. Updated instruction text to session/thread scope in
`core/templates/search_tool/tool_description.md`.
6. Expanded tests in `core/tests/suite/search_tool.rs`, plus unit
coverage in:
   - `core/src/codex.rs`
   - `core/src/state/session.rs`

### Behavior after change
1. Search activates matched tools.
2. Additional searches union into active selection.
3. Selection survives new turns in the same thread.
4. Resume/fork restores selection from rollout history.
5. Separate threads do not inherit selection unless forked.
2026-02-15 19:18:41 -08:00
sayan-oai
060a320e7d fix: show user warning when using default fallback metadata (#11690)
### What
It's currently unclear when the harness falls back to the default,
generic `ModelInfo`. This happens when the `remote_models` feature is
disabled or the model is truly unknown, and can lead to bad performance
and issues in the harness.

Add a user-facing warning when this happens so they are aware when their
setup is broken.

### Tests
Added tests, tested locally.
2026-02-15 18:46:05 -08:00
Charley Cunningham
85034b189e core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487)
This PR keeps compaction context-layout test coverage separate from
runtime compaction behavior changes, so runtime logic review can stay
focused.

## Included
- Adds reusable context snapshot helpers in
`core/tests/common/context_snapshot.rs` for rendering model-visible
request/history shapes.
- Standardizes helper naming for readability:
  - `format_request_input_snapshot`
  - `format_response_items_snapshot`
  - `format_labeled_requests_snapshot`
  - `format_labeled_items_snapshot`
- Expands snapshot coverage for both local and remote compaction flows:
  - pre-turn auto-compaction
  - pre-turn failure/context-window-exceeded paths
  - mid-turn continuation compaction
  - manual `/compact` with and without prior user turns
- Captures both sides where relevant:
  - compaction request shape
  - post-compaction history layout shape
- Adds/uses shared request-inspection helpers so assertions target
structured request content instead of ad-hoc JSON string parsing.
- Aligns snapshots/assertions to current behavior and leaves explicit
`TODO(ccunningham)` notes where behavior is known and intentionally
deferred.

## Not Included
- No runtime compaction logic changes.
- No model-visible context/state behavior changes.
2026-02-14 19:57:10 -08:00
viyatb-oai
b527ee2890 feat(core): add structured network approval plumbing and policy decision model (#11672)
### Description
#### Summary
Introduces the core plumbing required for structured network approvals

#### What changed
- Added structured network policy decision modeling in core.
- Added approval payload/context types needed for network approval
semantics.
- Wired shell/unified-exec runtime plumbing to consume structured
decisions.
- Updated related core error/event surfaces for structured handling.
- Updated protocol plumbing used by core approval flow.
- Included small CLI debug sandbox compatibility updates needed by this
layer.

#### Why
establishes the minimal backend foundation for network approvals without
yet changing high-level orchestration or TUI behavior.

#### Notes
- Behavior remains constrained by existing requirements/config gating.
- Follow-up PRs in the stack handle orchestration, UX, and app-server
integration.

---------

Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
2026-02-14 04:18:12 +00:00
Charley Cunningham
67e577da53 Handle model-switch base instructions after compaction (#11659)
Strip trailing <model_switch> during model-switch compaction request,
and append <model_switch> after model switch compaction
2026-02-13 19:02:53 -08:00
pash-openai
6c0a924203 turn metadata: per-turn non-blocking (#11677) 2026-02-13 12:48:29 -08:00
Michael Bolin
2383978a2c fix: reduce flakiness of compact_resume_after_second_compaction_preserves_history (#11663)
## Why
`compact_resume_after_second_compaction_preserves_history` has been
intermittently flaky in Windows CI.

The test had two one-shot request matchers in the second compact/resume
phase that could overlap, and it waited for the first `Warning` event
after compaction. In practice, that made the test sensitive to
platform/config-specific prompt shape and unrelated warning timing.

## What Changed
- Hardened the second compaction matcher in
`codex-rs/core/tests/suite/compact_resume_fork.rs` so it accepts
expected compact-request variants while explicitly excluding the
`AFTER_SECOND_RESUME` payload.
- Updated `compact_conversation()` to wait for the specific compaction
warning (`COMPACT_WARNING_MESSAGE`) rather than any `Warning` event.
- Added an inline comment explaining why the matcher is intentionally
broad but disjoint from the follow-up resume matcher.

## Test Plan
- `cargo test -p codex-core --test all
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
-- --exact`
- Repeated the same test in a loop (40 runs) to check for local
nondeterminism.
2026-02-13 09:51:22 -08:00
Anton Panasenko
38c442ca7f core: limit search_tool_bm25 to Apps and clarify discovery guidance (#11669)
## Summary
- Limit `search_tool_bm25` indexing to `codex_apps` tools only, so
non-Apps MCP servers are no longer discoverable through this search
path.
- Move search-tool discovery guidance into the `search_tool_bm25` tool
description (via template include) instead of injecting it as a separate
developer message.
- Update Apps discovery guidance wording to clarify when to use
`search_tool_bm25` for Apps-backed systems (for example Slack, Google
Drive, Jira, Notion) and when to call tools directly.
- Remove dead `core` helper code (`filter_codex_apps_mcp_tools` and
`codex_apps_connector_id`) that is no longer used after the
tool-selection refactor.
- Update `core` search-tool tests to assert codex-apps-only behavior and
to validate guidance from the tool description.

## Validation
-  `just fmt`
-  `cargo test -p codex-core search_tool`
- ⚠️ `cargo test -p codex-core` was attempted, but the run repeatedly
stalled on
`tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool`.

## Tickets
- None
2026-02-13 09:32:46 -08:00
Dylan Hurd
35692e99c1 chore(approvals) More approvals scenarios (#11660)
## Summary
Add some additional tests to approvals flow

## Testing
- [x] these are tests
2026-02-12 19:54:54 -08:00
Charley Cunningham
f24669d444 Persist complete TurnContextItem state via canonical conversion (#11656)
## Summary

This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.

Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.

No context injection/diff behavior changes are included in this PR.

## Why this change

The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.

This PR addresses both with a minimal, reviewable change.

## Changes

### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.

Files:
- `codex-rs/protocol/src/protocol.rs`

### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.

Files:
- `codex-rs/core/src/codex.rs`

### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
  - sampling request path
  - compaction path

Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`

### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
  - deserializing old payloads with no `network`
  - serializing when `network` is present

Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.
2026-02-12 17:22:44 -08:00
Michael Bolin
a4cc1a4a85 feat: introduce Permissions (#11633)
## Why
We currently carry multiple permission-related concepts directly on
`Config` for shell/unified-exec behavior (`approval_policy`,
`sandbox_policy`, `network`, `shell_environment_policy`,
`windows_sandbox_mode`).

Consolidating these into one in-memory struct makes permission handling
easier to reason about and sets up the next step: supporting named
permission profiles (`[permissions.PROFILE_NAME]`) without changing
behavior now.

This change is mostly mechanical: it updates existing callsites to go
through `config.permissions`, but it does not yet refactor those
callsites to take a single `Permissions` value in places where multiple
permission fields are still threaded separately.

This PR intentionally **does not** change the on-disk `config.toml`
format yet and keeps compatibility with legacy config keys.

## What Changed
- Introduced `Permissions` in `core/src/config/mod.rs`.
- Added `Config::permissions` and moved effective runtime permission
fields under it:
  - `approval_policy`
  - `sandbox_policy`
  - `network`
  - `shell_environment_policy`
  - `windows_sandbox_mode`
- Updated config loading/building so these effective values are still
derived from the same existing config inputs and constraints.
- Updated Windows sandbox helpers/resolution to read/write via
`permissions`.
- Threaded the new field through all permission consumers across core
runtime, app-server, CLI/exec, TUI, and sandbox summary code.
- Updated affected tests to reference `config.permissions.*`.
- Renamed the struct/field from
`EffectivePermissions`/`effective_permissions` to
`Permissions`/`permissions` and aligned variable naming accordingly.

## Verification
- `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
-p codex-exec -p codex-utils-sandbox-summary`
- `cargo build -p codex-core -p codex-tui -p codex-cli -p
codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
2026-02-12 14:42:54 -08:00
Curtis 'Fjord' Hawthorne
466be55abc Add js_repl host helpers and exec end events (#10672)
## Summary

This PR adds host-integrated helper APIs for `js_repl` and updates model
guidance so the agent can use them reliably.

### What’s included

- Add `codex.tool(name, args?)` in the JS kernel so `js_repl` can call
normal Codex tools.
- Keep persistent JS state and scratch-path helpers available:
  - `codex.state`
  - `codex.tmpDir`
- Wire `js_repl` tool calls through the standard tool router path.
- Add/align `js_repl` execution completion/end event behavior with
existing tool logging patterns.
- Update dynamic prompt injection (`project_doc`) to document:
  - how to call `codex.tool(...)`
  - raw output behavior
- image flow via `view_image` (`codex.tmpDir` +
`codex.tool("view_image", ...)`)
- stdio safety guidance (`console.log` / `codex.tool`, avoid direct
`process.std*`)

## Why

- Standardize JS-side tool usage on `codex.tool(...)`
- Make `js_repl` behavior more consistent with existing tool execution
and event/logging patterns.
- Give the model enough runtime guidance to use `js_repl` safely and
effectively.

## Testing

- Added/updated unit and runtime tests for:
  - `codex.tool` calls from `js_repl` (including shell/MCP paths)
  - image handoff flow via `view_image`
  - prompt-injection text for `js_repl` guidance
  - execution/end event behavior and related regression coverage




#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/10674
- 👉 `2` https://github.com/openai/codex/pull/10672
-  `3` https://github.com/openai/codex/pull/10671
-  `4` https://github.com/openai/codex/pull/10673
-  `5` https://github.com/openai/codex/pull/10670
2026-02-12 12:10:25 -08:00
Owen Lin
efc8d45750 feat(app-server): experimental flag to persist extended history (#11227)
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).

### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.

Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.

### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).

This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)

In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit

For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.

And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.

#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
2026-02-12 19:34:22 +00:00
jif-oai
545b266839 fix: fmt (#11619) 2026-02-12 18:13:00 +00:00
Dylan Hurd
f39f506700 fix(core) model_info preserves slug (#11602)
## Summary
Preserve the specified model slug when we get a prefix-based match

## Testing
- [x] added unit test

---------

Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
2026-02-12 09:43:32 -08:00
gt-oai
4027f1f1a4 Fix test flake (#11448)
Flaking with

```
   Nextest run ID 6b7ff5f7-57f6-4c9c-8026-67f08fa2f81f with nextest profile: default
      Starting 3282 tests across 118 binaries (21 tests skipped)
          FAIL [  14.548s] (1367/3282) codex-core::all suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input
    stdout ───

      running 1 test
      test suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input ... FAILED

      failures:

      failures:
          suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input

      test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 522 filtered out; finished in 14.41s

    stderr ───

      thread 'suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input' (15632) panicked at C:\a\codex\codex\codex-rs\core\tests\common\lib.rs:186:14:
      timeout waiting for event: Elapsed(())
      stack backtrace:
      read_output:
      Exit code: 0
      Wall time: 8.5 seconds
      Output:
      line1
      naïve café
      line3

      stdout:
      line1
      naïve café
      line3
      patch:
      *** Begin Patch
      *** Add File: target.txt
      +line1
      +naïve café
      +line3
      *** End Patch
      note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
2026-02-12 09:37:24 +00:00
pakrym-oai
fd7f2aedc7 Handle response.incomplete (#11558)
Treat it same as error.
2026-02-12 00:11:38 -08:00
pakrym-oai
d391f3e2f9 Hide the first websocket retry (#11548)
Sometimes connection needs to be quickly reestablished, don't produce an
error for that.
2026-02-11 22:48:13 -08:00
sayan-oai
d1a97ed852 fix compilation (#11532)
fix broken main
2026-02-11 19:31:13 -08:00
Michael Bolin
abbd74e2be feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.

It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.

## What

- Added `ReadOnlyAccess` in protocol with:
  - `Restricted { include_platform_defaults, readable_roots }`
  - `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
  - `ReadOnly { access: ReadOnlyAccess }`
  - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.

## Compatibility / rollout

- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
2026-02-11 18:31:14 -08:00
Ahmed Ibrahim
95fb86810f Update context window after model switch (#11520)
- Update token usage aggregation to refresh model context window after a
model change.
- Add protocol/core tests, including an e2e model-switch test that
validates switching to a smaller model updates telemetry.
2026-02-11 17:41:23 -08:00
Ahmed Ibrahim
40de788c4d Clamp auto-compact limit to context window (#11516)
- Clamp auto-compaction to the minimum of configured limit and 90% of
context window
- Add an e2e compact test for clamped behavior
- Update remote compact tests to account for earlier auto-compaction in
setup turns
2026-02-11 17:41:08 -08:00
Ahmed Ibrahim
6938150c5e Pre-sampling compact with previous model context (#11504)
- Run pre-sampling compact through a single helper that builds
previous-model turn context and compacts before the follow-up request
when switching to a smaller context window.
- Keep compaction events on the parent turn id and add compact suite
coverage for switch-in-session and resume+switch flows.
2026-02-11 17:24:06 -08:00
Anton Panasenko
d3b078c282 Consolidate search_tool feature into apps (#11509)
## Summary
- Remove `Feature::SearchTool` and the `search_tool` config key from the
feature registry/schema.
- Gate `search_tool_bm25` exposure via `Feature::Apps` in
`core/src/tools/spec.rs`.
- Update MCP selection logic in `core/src/codex.rs` to use
`Feature::Apps` for search-tool behavior.
- Update `core/tests/suite/search_tool.rs` to enable `Feature::Apps`.
- Regenerate `core/config.schema.json` via `just write-config-schema`.

## Testing
- `just fmt`
- `cargo test -p codex-core --test all suite::search_tool::`

## Tickets
- None
2026-02-11 16:52:42 -08:00
jif-oai
2fac9cc8cd chore: sub-agent never ask for approval (#11464) 2026-02-11 19:19:37 +00:00
Max Johnson
7053aa5457 Reapply "Add app-server transport layer with websocket support" (#11370)
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.

This reverts commit 47356ff83c.

## Summary

To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
   - inbound handling (`transport_event_rx`)
   - outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks

## Validation

Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests

<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-02-11 18:13:39 +00:00
pakrym-oai
eac5473114 Do not attempt to append after response.completed (#11402)
Completed responses are fully done, and new response must be created.
2026-02-11 07:45:17 -08:00
Michael Bolin
476c1a7160 Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
## Why

`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.

## Net Change

- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.

## Outcome

- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
2026-02-10 22:44:02 -08:00
xl-openai
fdd0cd1de9 feat: support multiple rate limits (#11260)
Added multi-limit support end-to-end by carrying limit_name in
rate-limit snapshots and handling multiple buckets instead of only
codex.
Extended /usage client parsing to consume additional_rate_limits
Updated TUI /status and in-memory state to store/render per-limit
snapshots
Extended app-server rate-limit read response: kept rate_limits and added
rate_limits_by_name.
Adjusted usage-limit error messaging for non-default codex limit buckets
2026-02-10 20:09:31 -08:00