seeing issues with azure after default-enabling web search: #10071,
#10257.
need to work with azure to fix api-side, for now turning off
default-enable of web_search for azure.
diff is big because i moved logic to reuse
Add a `.sqlite` database to be used to store rollout metatdata (and
later logs)
This PR is phase 1:
* Add the database and the required infrastructure
* Add a backfill of the database
* Persist the newly created rollout both in files and in the DB
* When we need to get metadata or a rollout, consider the `JSONL` as the
source of truth but compare the results with the DB and show any errors
## Summary
#9555 is the start of a rename, so I'm starting to standardize here.
Sets up `model_instructions` templating with a strongly-typed object for
injecting a personality block into the model instructions.
## Testing
- [x] Added tests
- [x] Ran locally
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.
Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
Adds a new feature
`enable_request_compression` that will compress using zstd requests to
the codex-backend. Currently only enabled for codex-backend so only enabled for openai providers when using chatgpt::auth even when the feature is enabled
Added a new info log line too for evaluating the compression ratio and
overhead off compressing before requesting. You can enable with
`RUST_LOG=$RUST_LOG,codex_client::transport=info`
```
2026-01-06T00:09:48.272113Z INFO codex_client::transport: Compressed request body with zstd pre_compression_bytes=28914 post_compression_bytes=11485 compression_duration_ms=0
```
We used to override truncation policy by comparing model info vs config
value in context manager. A better way to do it is to construct model
info using the config value
Add `web_search_cached` feature to config. Enables `web_search` tool
with access only to cached/indexed results (see
[docs](https://platform.openai.com/docs/guides/tools-web-search#live-internet-access)).
This takes precedence over the existing `web_search_request`, which
continues to enable `web_search` over live results as it did before.
`web_search_cached` is disabled for review mode, as `web_search_request`
is.
1. Skills load once in core at session start; the cached outcome is
reused across core and surfaced to TUI via SessionConfigured.
2. TUI detects explicit skill selections, and core injects the matching
SKILL.md content into the turn when a selected skill is present.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
- Introduce `openai_models` in `/core`
- Move `PRESETS` under it
- Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`,
`ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol`
- Introduce `Op::ListModels` and `EventMsg::AvailableModels`
Next steps:
- migrate `app-server` and `tui` to use the introduced Operation
## 🐛 Problem
Users running commands with non-ASCII characters (like Russian text
"пример") in Windows/WSL environments experience garbled text in
VSCode's shell preview window, with Unicode replacement characters (�)
appearing instead of the actual text.
**Issue**: https://github.com/openai/codex/issues/6178
## 🔧 Root Cause
The issue was in `StreamOutput<Vec<u8>>::from_utf8_lossy()` method in
`codex-rs/core/src/exec.rs`, which used `String::from_utf8_lossy()` to
convert shell output bytes to strings. This function immediately
replaces any invalid UTF-8 byte sequences with replacement characters,
without attempting to decode using other common encodings.
In Windows/WSL environments, shell output often uses encodings like:
- Windows-1252 (common Windows encoding)
- Latin-1/ISO-8859-1 (extended ASCII)
## 🛠️ Solution
Replaced the simple `String::from_utf8_lossy()` call with intelligent
encoding detection via a new `bytes_to_string_smart()` function that
tries multiple encoding strategies:
1. **UTF-8** (fast path for valid UTF-8)
2. **Windows-1252** (handles Windows-specific characters in 0x80-0x9F
range)
3. **Latin-1** (fallback for extended ASCII)
4. **Lossy UTF-8** (final fallback, same as before)
## 📁 Changes
### New Files
- `codex-rs/core/src/text_encoding.rs` - Smart encoding detection module
- `codex-rs/core/tests/suite/text_encoding_fix.rs` - Integration tests
### Modified Files
- `codex-rs/core/src/lib.rs` - Added text_encoding module
- `codex-rs/core/src/exec.rs` - Updated StreamOutput::from_utf8_lossy()
- `codex-rs/core/tests/suite/mod.rs` - Registered new test module
## ✅ Testing
- **5 unit tests** covering UTF-8, Windows-1252, Latin-1, and fallback
scenarios
- **2 integration tests** simulating the exact Issue #6178 scenario
- **Demonstrates improvement** over the previous
`String::from_utf8_lossy()` approach
All tests pass:
```bash
cargo test -p codex-core text_encoding
cargo test -p codex-core test_shell_output_encoding_issue_6178
```
## 🎯 Impact
- ✅ **Eliminates garbled text** in VSCode shell preview for non-ASCII
content
- ✅ **Supports Windows/WSL environments** with proper encoding detection
- ✅ **Zero performance impact** for UTF-8 text (fast path)
- ✅ **Backward compatible** - UTF-8 content works exactly as before
- ✅ **Handles edge cases** with robust fallback mechanism
## 🧪 Test Scenarios
The fix has been tested with:
- Russian text ("пример")
- Windows-1252 quotation marks (""test")
- Latin-1 accented characters ("café")
- Mixed encoding content
- Invalid byte sequences (graceful fallback)
## 📋 Checklist
- [X] Addresses the reported issue
- [X] Includes comprehensive tests
- [X] Maintains backward compatibility
- [X] Follows project coding conventions
- [X] No breaking changes
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
This PR threads execpolicy2 into codex-core.
activated via feature flag: exec_policy (on by default)
reads and parses all .codexpolicy files in `codex_home/codex`
refactored tool runtime API to integrate execpolicy logic
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
## Summary
Consolidates our apply_patch tests into one suite, and ensures each test
case tests the various ways the harness supports apply_patch:
1. Freeform custom tool call
2. JSON function tool
3. Simple shell call
4. Heredoc shell call
There are a few test cases that are specific to a particular variant,
I've left those alone.
## Testing
- [x] This adds a significant number of tests
Unified exec isn't working on Linux because we don't provide the correct
arg0.
The library we use for pty management doesn't allow setting arg0
separately from executable. Use the same aliasing strategy we use for
`apply_patch` for `codex-linux-sandbox`.
Use `#[ctor]` hack to dispatch codex-linux-sandbox calls.
Addresses https://github.com/openai/codex/issues/6450
This PR makes an "insufficient quota" error fatal so we don't attempt to
retry it multiple times in the agent loop.
We have multiple bug reports from users about intermittent retry
behaviors, and this could explain some of them. With this change, we'll
eliminate the retries and surface a clear error message.
The PR is a nearly identical copy of [this
PR](https://github.com/openai/codex/pull/4837) contributed by
@abimaelmartell. The original PR has gone stale. Rather than wait for
the contributor to resolve merge conflicts, I wanted to get this change
in.
Currently, when the access token expires, we attempt to use the refresh
token to acquire a new access token. This works most of the time.
However, there are situations where the refresh token is expired,
exhausted (already used to perform a refresh), or revoked. In those
cases, the current logic treats the error as transient and attempts to
retry it repeatedly.
This PR changes the token refresh logic to differentiate between
permanent and transient errors. It also changes callers to treat the
permanent errors as fatal rather than retrying them. And it provides
better error messages to users so they understand how to address the
problem. These error messages should also help us further understand why
we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the
extension.
<img width="863" height="38" alt="image"
src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f"
/>
I also correct the spelling of "Re-connecting", which shouldn't have a
hyphen in it.
Testing: I manually tested these code paths by adding temporary code to
programmatically cause my refresh token to be exhausted (by calling the
token refresh endpoint in a tight loop more than 50 times). I then
simulated an access token expiration, which caused the token refresh
logic to be invoked. I confirmed that the updated logic properly handled
the error condition.
Note: We earlier discussed the idea of forcefully logging out the user
at the point where token refresh failed. I made several attempts to do
this, and all of them resulted in a bad UX. It's important to surface
this error to users in a way that explains the problem and tells them
that they need to log in again. We also previously discussed deleting
the auth.json file when this condition is detected. That also creates
problems because it effectively changes the auth status from logged in
to logged out, and this causes odd failures and inconsistent UX. I think
it's therefore better not to delete auth.json in this case. If the user
closes the CLI or VSCE and starts it again, we properly detect that the
access token is expired and the refresh token is "dead", and we force
the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505
## Summary
Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform
apply_patch tool as opposed to the function call path. The good news is
that all the tests pass with zero logical tests, with the exception of
the heredoc, which doesn't really make sense in the freeform tool
context anyway.
@jif-oai since you wrote the original tests in #5557, I'd love your
opinion on the right way to DRY these test cases between the two. Happy
to set up a more sophisticated harness, but didn't want to go down the
rabbit hole until we agreed on the right pattern
## Testing
- [x] These are tests
In this PR, I am exploring migrating task kind to an invocation of
Codex. The main reason would be getting rid off multiple
`ConversationHistory` state and streamlining our context/history
management.
This approach depends on opening a channel between the sub-codex and
codex. This channel is responsible for forwarding `interactive`
(`approvals`) and `non-interactive` events. The `task` is responsible
for handling those events.
This opens the door for implementing `codex as a tool`, replacing
`compact` and `review`, and potentially subagents.
One consideration is this code is very similar to `app-server` specially
in the approval part. If in the future we wanted an interactive
`sub-codex` we should consider using `codex-mcp`
feature: Add "!cmd" user shell execution
This change lets users run local shell commands directly from the TUI by
prefixing their input with ! (e.g. !ls). Output is truncated to keep the
exec cell usable, and Ctrl-C cleanly
interrupts long-running commands (e.g. !sleep 10000).
**Summary of changes**
- Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
(core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
- Reuse the existing tool router: the task constructs a ToolCall for the
local_shell tool and relies on ShellHandler, so no manual MCP tool
lookup is required.
- Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
TUI can show command metadata, live output, and exit status.
**End-to-end flow**
**TUI handling**
1. ChatWidget::submit_user_message (TUI) intercepts messages starting
with !.
2. Non-empty commands dispatch Op::RunUserShellCommand { command };
empty commands surface a help hint.
3. No UserInput items are created, so nothing is enqueued for the model.
**Core submission loop**
4. The submission loop routes the op to handlers::run_user_shell_command
(core/src/codex.rs).
5. A fresh TurnContext is created and Session::spawn_user_shell_command
enqueues UserShellCommandTask.
**Task execution**
6. UserShellCommandTask::run emits TaskStartedEvent, formats the
command, and prepares a ToolCall targeting local_shell.
7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.
**Shell tool runtime**
8. ShellHandler::run_exec_like launches the process via the unified exec
runtime, honoring sandbox and shell policies, and emits
ExecCommandBegin/End.
9. Stdout/stderr are captured for the UI, but the task does not turn the
resulting ToolOutput into a model response.
**Completion**
10. After ExecCommandEnd, the task finishes without an assistant
message; the session marks it complete and the exec cell displays the
final output.
**Conversation context**
- The command and its output never enter the conversation history or the
model prompt; the flow is local-only.
- Only exec/task events are emitted for UI rendering.
**Demo video**
https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
move the truncation logic to conversation history to use on any tool
output. This will help us in avoiding edge cases while truncating the
tool calls and mcp calls.
Adds a new ItemStarted event and delivers UserMessage as the first item
type (more to come).
Renames `InputItem` to `UserInput` considering we're using the `Item`
suffix for actual items.
## Summary
This PR is an alternative approach to #4711, but instead of changing our
storage, parses out shell calls in the client and reserializes them on
the fly before we send them out as part of the request.
What this changes:
1. Adds additional serialization logic when the
ApplyPatchToolType::Freeform is in use.
2. Adds a --custom-apply-patch flag to enable this setting on a
session-by-session basis.
This change is delicate, but is not meant to be permanent. It is meant
to be the first step in a migration:
1. (This PR) Add in-flight serialization with config
2. Update model_family default
3. Update serialization logic to store turn outputs in a structured
format, with logic to serialize based on model_family setting.
4. Remove this rewrite in-flight logic.
## Test Plan
- [x] Additional unit tests added
- [x] Integration tests added
- [x] Tested locally
# Tool System Refactor
- Centralizes tool definitions and execution in `core/src/tools/*`:
specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
registry/dispatch (`registry.rs`), and shared context (`context.rs`).
One registry now builds the model-visible tool list and binds handlers.
- Router converts model responses to tool calls; Registry dispatches
with consistent telemetry via `codex-rs/otel` and unified error
handling. Function, Local Shell, MCP, and experimental `unified_exec`
all flow through this path; legacy shell aliases still work.
- Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
make adding tools predictable and testable.
Example: `read_file`
- Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
registered by `build_specs`).
- Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
- E2E test: `core/tests/suite/read_file.rs` validates the tool returns
the requested lines.
## Next steps:
- Decompose `handle_container_exec_with_params`
- Add parallel tool calls