**Why We Did This**
- The goal is to reduce MCP tool context pollution by not exposing the
full MCP tool list up front
- It forces an explicit discovery step (`search_tool_bm25`) so the model
narrows tool scope before making MCP calls, which helps relevance and
lowers prompt/tool clutter.
**What It Changed**
- Added a new experimental feature flag `search_tool` in
`core/src/features.rs:90` and `core/src/features.rs:430`.
- Added config/schema support for that flag in
`core/config.schema.json:214` and `core/config.schema.json:1235`.
- Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
`core/Cargo.toml:23`.
- Added new tool handler `search_tool_bm25` in
`core/src/tools/handlers/search_tool_bm25.rs:18`.
- Registered the handler and tool spec in
`core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
`core/src/tools/spec.rs:1344`.
- Extended `ToolsConfig` to carry `search_tool` enablement in
`core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
- Injected dedicated developer instructions for tool-discovery workflow
in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
`core/templates/search_tool/developer_instructions.md:1`.
- Added session state to store one-shot selected MCP tools in
`core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
- Added filtering so when feature is enabled, only selected MCP tools
are exposed on the next request (then consumed) in
`core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
- Added E2E suite coverage for
enablement/instructions/hide-until-search/one-turn-selection in
`core/tests/suite/search_tool.rs:72`,
`core/tests/suite/search_tool.rs:109`,
`core/tests/suite/search_tool.rs:147`, and
`core/tests/suite/search_tool.rs:218`.
- Refactored test helper utilities to support config-driven tool
collection in `core/tests/suite/tools.rs:281`.
**Net Behavioral Effect**
- With `search_tool` **off**: existing MCP behavior (tools exposed
normally).
- With `search_tool` **on**: MCP tools start hidden, model must call
`search_tool_bm25`, and only returned `selected_tools` are available for
the next model call.
## Summary
When switching models, we should append the instructions of the new
model to the conversation as a developer message.
## Test
- [x] Adds a unit test
Add a centralized FileWatcher in codex-core (using notify) that watches
skill roots from the config layer stack (recursive)
Send `SkillsChanged` events when relevant file system changes are
detected
On `SkillsChanged`:
* Invalidate the skills cache immediately in ThreadManager
* Emit EventMsg::SkillsUpdateAvailable to active sessions
~~* Broadcast a new app-server notification:
SkillsListUpdatedNotification~~
This change does not inject new items into the event stream. That means
the agent will not know about new skills, so it won't be able to
implicitly invoke new skills. It also won't know about changes to
existing skills, so if it has already read the contents of a modified
skill, it will not honor the new behavior.
This change also does not detect modifications to AGENTS.md.
I plan to address these limitations in a follow-on PR modeled after
#9985. Injection of new skills and AGENTS was deemed to risky, hence the
need to split the feature into two stages. The changes in this PR were
designed to easily accommodate the second stage once we have some other
foundational changes in place.
Testing: In addition to automated tests, I did manual testing to confirm
that newly-created skills, deleted skills, and renamed skills are
reflected in the TUI skill picker menu. Also confirmed that
modifications to behaviors for explicitly-invoked skills are honored.
---------
Co-authored-by: Xin Lin <xl@openai.com>
## Description
### What changed
- Switch the arg0 helper root from `~/.codex/tmp/path` to
`~/.codex/tmp/path2`
- Add `Arg0PathEntryGuard` to keep both the `TempDir` and an exclusive
`.lock` file alive for the process lifetime
- Add a startup janitor that scans `path2` and deletes only directories
whose lock can be acquired
### Tests
- `cargo clippy -p codex-arg0`
- `cargo clippy -p codex-core`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core`
seeing issues with azure after default-enabling web search: #10071,
#10257.
need to work with azure to fix api-side, for now turning off
default-enable of web_search for azure.
diff is big because i moved logic to reuse
Add a `.sqlite` database to be used to store rollout metatdata (and
later logs)
This PR is phase 1:
* Add the database and the required infrastructure
* Add a backfill of the database
* Persist the newly created rollout both in files and in the DB
* When we need to get metadata or a rollout, consider the `JSONL` as the
source of truth but compare the results with the DB and show any errors
## Summary
#9555 is the start of a rename, so I'm starting to standardize here.
Sets up `model_instructions` templating with a strongly-typed object for
injecting a personality block into the model instructions.
## Testing
- [x] Added tests
- [x] Ran locally
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.
Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
Adds a new feature
`enable_request_compression` that will compress using zstd requests to
the codex-backend. Currently only enabled for codex-backend so only enabled for openai providers when using chatgpt::auth even when the feature is enabled
Added a new info log line too for evaluating the compression ratio and
overhead off compressing before requesting. You can enable with
`RUST_LOG=$RUST_LOG,codex_client::transport=info`
```
2026-01-06T00:09:48.272113Z INFO codex_client::transport: Compressed request body with zstd pre_compression_bytes=28914 post_compression_bytes=11485 compression_duration_ms=0
```
We used to override truncation policy by comparing model info vs config
value in context manager. A better way to do it is to construct model
info using the config value
Add `web_search_cached` feature to config. Enables `web_search` tool
with access only to cached/indexed results (see
[docs](https://platform.openai.com/docs/guides/tools-web-search#live-internet-access)).
This takes precedence over the existing `web_search_request`, which
continues to enable `web_search` over live results as it did before.
`web_search_cached` is disabled for review mode, as `web_search_request`
is.
1. Skills load once in core at session start; the cached outcome is
reused across core and surfaced to TUI via SessionConfigured.
2. TUI detects explicit skill selections, and core injects the matching
SKILL.md content into the turn when a selected skill is present.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
- Introduce `openai_models` in `/core`
- Move `PRESETS` under it
- Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`,
`ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol`
- Introduce `Op::ListModels` and `EventMsg::AvailableModels`
Next steps:
- migrate `app-server` and `tui` to use the introduced Operation
## 🐛 Problem
Users running commands with non-ASCII characters (like Russian text
"пример") in Windows/WSL environments experience garbled text in
VSCode's shell preview window, with Unicode replacement characters (�)
appearing instead of the actual text.
**Issue**: https://github.com/openai/codex/issues/6178
## 🔧 Root Cause
The issue was in `StreamOutput<Vec<u8>>::from_utf8_lossy()` method in
`codex-rs/core/src/exec.rs`, which used `String::from_utf8_lossy()` to
convert shell output bytes to strings. This function immediately
replaces any invalid UTF-8 byte sequences with replacement characters,
without attempting to decode using other common encodings.
In Windows/WSL environments, shell output often uses encodings like:
- Windows-1252 (common Windows encoding)
- Latin-1/ISO-8859-1 (extended ASCII)
## 🛠️ Solution
Replaced the simple `String::from_utf8_lossy()` call with intelligent
encoding detection via a new `bytes_to_string_smart()` function that
tries multiple encoding strategies:
1. **UTF-8** (fast path for valid UTF-8)
2. **Windows-1252** (handles Windows-specific characters in 0x80-0x9F
range)
3. **Latin-1** (fallback for extended ASCII)
4. **Lossy UTF-8** (final fallback, same as before)
## 📁 Changes
### New Files
- `codex-rs/core/src/text_encoding.rs` - Smart encoding detection module
- `codex-rs/core/tests/suite/text_encoding_fix.rs` - Integration tests
### Modified Files
- `codex-rs/core/src/lib.rs` - Added text_encoding module
- `codex-rs/core/src/exec.rs` - Updated StreamOutput::from_utf8_lossy()
- `codex-rs/core/tests/suite/mod.rs` - Registered new test module
## ✅ Testing
- **5 unit tests** covering UTF-8, Windows-1252, Latin-1, and fallback
scenarios
- **2 integration tests** simulating the exact Issue #6178 scenario
- **Demonstrates improvement** over the previous
`String::from_utf8_lossy()` approach
All tests pass:
```bash
cargo test -p codex-core text_encoding
cargo test -p codex-core test_shell_output_encoding_issue_6178
```
## 🎯 Impact
- ✅ **Eliminates garbled text** in VSCode shell preview for non-ASCII
content
- ✅ **Supports Windows/WSL environments** with proper encoding detection
- ✅ **Zero performance impact** for UTF-8 text (fast path)
- ✅ **Backward compatible** - UTF-8 content works exactly as before
- ✅ **Handles edge cases** with robust fallback mechanism
## 🧪 Test Scenarios
The fix has been tested with:
- Russian text ("пример")
- Windows-1252 quotation marks (""test")
- Latin-1 accented characters ("café")
- Mixed encoding content
- Invalid byte sequences (graceful fallback)
## 📋 Checklist
- [X] Addresses the reported issue
- [X] Includes comprehensive tests
- [X] Maintains backward compatibility
- [X] Follows project coding conventions
- [X] No breaking changes
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
This PR threads execpolicy2 into codex-core.
activated via feature flag: exec_policy (on by default)
reads and parses all .codexpolicy files in `codex_home/codex`
refactored tool runtime API to integrate execpolicy logic
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
## Summary
Consolidates our apply_patch tests into one suite, and ensures each test
case tests the various ways the harness supports apply_patch:
1. Freeform custom tool call
2. JSON function tool
3. Simple shell call
4. Heredoc shell call
There are a few test cases that are specific to a particular variant,
I've left those alone.
## Testing
- [x] This adds a significant number of tests
Unified exec isn't working on Linux because we don't provide the correct
arg0.
The library we use for pty management doesn't allow setting arg0
separately from executable. Use the same aliasing strategy we use for
`apply_patch` for `codex-linux-sandbox`.
Use `#[ctor]` hack to dispatch codex-linux-sandbox calls.
Addresses https://github.com/openai/codex/issues/6450
This PR makes an "insufficient quota" error fatal so we don't attempt to
retry it multiple times in the agent loop.
We have multiple bug reports from users about intermittent retry
behaviors, and this could explain some of them. With this change, we'll
eliminate the retries and surface a clear error message.
The PR is a nearly identical copy of [this
PR](https://github.com/openai/codex/pull/4837) contributed by
@abimaelmartell. The original PR has gone stale. Rather than wait for
the contributor to resolve merge conflicts, I wanted to get this change
in.
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
## Summary
Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform
apply_patch tool as opposed to the function call path. The good news is
that all the tests pass with zero logical tests, with the exception of
the heredoc, which doesn't really make sense in the freeform tool
context anyway.
@jif-oai since you wrote the original tests in #5557, I'd love your
opinion on the right way to DRY these test cases between the two. Happy
to set up a more sophisticated harness, but didn't want to go down the
rabbit hole until we agreed on the right pattern
## Testing
- [x] These are tests
In this PR, I am exploring migrating task kind to an invocation of
Codex. The main reason would be getting rid off multiple
`ConversationHistory` state and streamlining our context/history
management.
This approach depends on opening a channel between the sub-codex and
codex. This channel is responsible for forwarding `interactive`
(`approvals`) and `non-interactive` events. The `task` is responsible
for handling those events.
This opens the door for implementing `codex as a tool`, replacing
`compact` and `review`, and potentially subagents.
One consideration is this code is very similar to `app-server` specially
in the approval part. If in the future we wanted an interactive
`sub-codex` we should consider using `codex-mcp`
feature: Add "!cmd" user shell execution
This change lets users run local shell commands directly from the TUI by
prefixing their input with ! (e.g. !ls). Output is truncated to keep the
exec cell usable, and Ctrl-C cleanly
interrupts long-running commands (e.g. !sleep 10000).
**Summary of changes**
- Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
(core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
- Reuse the existing tool router: the task constructs a ToolCall for the
local_shell tool and relies on ShellHandler, so no manual MCP tool
lookup is required.
- Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
TUI can show command metadata, live output, and exit status.
**End-to-end flow**
**TUI handling**
1. ChatWidget::submit_user_message (TUI) intercepts messages starting
with !.
2. Non-empty commands dispatch Op::RunUserShellCommand { command };
empty commands surface a help hint.
3. No UserInput items are created, so nothing is enqueued for the model.
**Core submission loop**
4. The submission loop routes the op to handlers::run_user_shell_command
(core/src/codex.rs).
5. A fresh TurnContext is created and Session::spawn_user_shell_command
enqueues UserShellCommandTask.
**Task execution**
6. UserShellCommandTask::run emits TaskStartedEvent, formats the
command, and prepares a ToolCall targeting local_shell.
7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.
**Shell tool runtime**
8. ShellHandler::run_exec_like launches the process via the unified exec
runtime, honoring sandbox and shell policies, and emits
ExecCommandBegin/End.
9. Stdout/stderr are captured for the UI, but the task does not turn the
resulting ToolOutput into a model response.
**Completion**
10. After ExecCommandEnd, the task finishes without an assistant
message; the session marks it complete and the exec cell displays the
final output.
**Conversation context**
- The command and its output never enter the conversation history or the
model prompt; the flow is local-only.
- Only exec/task events are emitted for UI rendering.
**Demo video**
https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
move the truncation logic to conversation history to use on any tool
output. This will help us in avoiding edge cases while truncating the
tool calls and mcp calls.