Add `web_search_cached` feature to config. Enables `web_search` tool
with access only to cached/indexed results (see
[docs](https://platform.openai.com/docs/guides/tools-web-search#live-internet-access)).
This takes precedence over the existing `web_search_request`, which
continues to enable `web_search` over live results as it did before.
`web_search_cached` is disabled for review mode, as `web_search_request`
is.
## Summary
Adds a FeatureFlag to enforce UTF8 encoding in powershell, particularly
Windows Powershell v5. This should help address issues like #7290.
Notably, this PR does not include the ability to parse `apply_patch`
invocations within UTF8 shell commands (calls to the freeform tool
should not be impacted). I am leaving this out of scope for now. We
should address before this feature becomes Stable, but those cases are
not the default behavior at this time so we're okay for experimentation
phase. We should continue cleaning up the `apply_patch::invocation`
logic and then can handle it more cleanly.
## Testing
- [x] Adds additional testing
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Description
Introduced `ExternalSandbox` policy to cover use case when sandbox
defined by outside environment, effectively it translates to
`SandboxMode#DangerFullAccess` for file system (since sandbox configured
on container level) and configurable `network_access` (either Restricted
or Enabled by outside environment).
as example you can configure `ExternalSandbox` policy as part of
`sendUserTurn` v1 app_server API:
```
{
"conversationId": <id>,
"cwd": <cwd>,
"approvalPolicy": "never",
"sandboxPolicy": {
"type": ""external-sandbox",
"network_access": "enabled"/"restricted"
},
"model": <model>,
"effort": <effort>,
....
}
```
https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
this PR updates all call non-test call sites to use it instead of
`Config::load_from_base_config_with_overrides()`.
This is important because `load_from_base_config_with_overrides()` uses
an empty `ConfigRequirements`, which is a reasonable default for testing
so the tests are not influenced by the settings on the host. This method
is now guarded by `#[cfg(test)]` so it cannot be used by business logic.
Because `ConfigBuilder::build()` is `async`, many of the test methods
had to be migrated to be `async`, as well. On the bright side, this made
it possible to eliminate a bunch of `block_on_future()` stuff.
Previous to this PR, we used a hand-rolled PowerShell parser in
`windows_safe_commands.rs` to take a `&str` of PowerShell script see if
it is equivalent to a list of `execvp(3)` invocations, and if so, we
then test each using `is_safe_powershell_command()` to determine if the
overall command is safe:
6e6338aa87/codex-rs/core/src/command_safety/windows_safe_commands.rs (L89-L98)
Unfortunately, our PowerShell parser did not recognize `@(...)` as a
special construct, so it was treated as an ordinary token. This meant
that the following would erroneously be considered "safe:"
```powershell
ls @(calc.exe)
```
The fix introduced in this PR is to do something comparable what we do
for Bash/Zsh, which is to use a "proper" parser to derive the list of
`execvp(3)` calls. For Bash/Zsh, we rely on
https://crates.io/crates/tree-sitter-bash, but there does not appear to
be a crate of comparable quality for parsing PowerShell statically
(https://github.com/airbus-cert/tree-sitter-powershell/ is the best
thing I found).
Instead, in this PR, we use a PowerShell script to parse the input
PowerShell program to produce the AST.
helpful in the future if we want more granularity for requesting
escalated permissions:
e.g when running in readonly sandbox, model can request to escalate to a
sandbox that allows writes
Currently, we only show the “don’t ask again for commands that start
with…” option when a command is immediately flagged as needing approval.
However, there is another case where we ask for approval: When a command
is initially auto-approved to run within sandbox, but it fails to run
inside sandbox, we would like to attempt to retry running outside of
sandbox. This will require a prompt to the user.
This PR addresses this latter case
- Introduce `with_remote_overrides` and update
`refresh_available_models`
- Put `auth_manager` instead of `auth_mode` on `models_manager`
- Remove `ShellType` and `ReasoningLevel` to use already existing
structs
- Inline response recording during streaming: `run_turn` now records
items as they arrive instead of building a `ProcessedResponseItem` list
and post‑processing via `process_items`.
- Simplify turn handling: `handle_output_item_done` returns the
follow‑up signal + optional tool future; `needs_follow_up` is set only
there, and in‑flight tool futures are drained once at the end (errors
logged, no extra state writes).
- Flattened stream loop: removed `process_items` indirection and the
extra output queue
- - Tests: relaxed `tool_parallelism::tool_results_grouped` to allow any
completion order while still requiring matching call/output IDs.
## Refactor of the `execpolicy` crate
To illustrate why we need this refactor, consider an agent attempting to
run `apple | rm -rf ./`. Suppose `apple` is allowed by `execpolicy`.
Before this PR, `execpolicy` would consider `apple` and `pear` and only
render one rule match: `Allow`. We would skip any heuristics checks on
`rm -rf ./` and immediately approve `apple | rm -rf ./` to run.
To fix this, we now thread a `fallback` evaluation function into
`execpolicy` that runs when no `execpolicy` rules match a given command.
In our example, we would run `fallback` on `rm -rf ./` and prevent
`apple | rm -rf ./` from being run without approval.
this PR enables TUI to approve commands and add their prefixes to an
allowlist:
<img width="708" height="605" alt="Screenshot 2025-11-21 at 4 18 07 PM"
src="https://github.com/user-attachments/assets/56a19893-4553-4770-a881-becf79eeda32"
/>
note: we only show the option to whitelist the command when
1) command is not multi-part (e.g `git add -A && git commit -m 'hello
world'`)
2) command is not already matched by an existing rule
This PR moves `ModelsFamily` to `openai_models`. It also propagates
`ModelsManager` to session services and use it to drive model family. We
also make `derive_default_model_family` private because it's a step
towards what we want: one place that gives model configuration.
This is a second step at having one source of truth for models
information and config: `ModelsManager`.
Next steps would be to remove `ModelsFamily` from config. That's massive
because it's being used in 41 occasions mostly pre launching `codex`.
Also, we need to make `find_family_for_model` private. It's also big
because it's being used in 21 occasions ~ all tests.
# Unified Exec Shell Selection on Windows
## Problem
reference issue #7466
The `unified_exec` handler currently deserializes model-provided tool
calls into the `ExecCommandArgs` struct:
```rust
#[derive(Debug, Deserialize)]
struct ExecCommandArgs {
cmd: String,
#[serde(default)]
workdir: Option<String>,
#[serde(default = "default_shell")]
shell: String,
#[serde(default = "default_login")]
login: bool,
#[serde(default = "default_exec_yield_time_ms")]
yield_time_ms: u64,
#[serde(default)]
max_output_tokens: Option<usize>,
#[serde(default)]
with_escalated_permissions: Option<bool>,
#[serde(default)]
justification: Option<String>,
}
```
The `shell` field uses a hard-coded default:
```rust
fn default_shell() -> String {
"/bin/bash".to_string()
}
```
When the model returns a tool call JSON that only contains `cmd` (which
is the common case), Serde fills in `shell` with this default value.
Later, `get_command` uses that value as if it were a model-provided
shell path:
```rust
fn get_command(args: &ExecCommandArgs) -> Vec<String> {
let shell = get_shell_by_model_provided_path(&PathBuf::from(args.shell.clone()));
shell.derive_exec_args(&args.cmd, args.login)
}
```
On Unix, this usually resolves to `/bin/bash` and works as expected.
However, on Windows this behavior is problematic:
- The hard-coded `"/bin/bash"` is not a valid Windows path.
- `get_shell_by_model_provided_path` treats this as a model-specified
shell, and tries to resolve it (e.g. via `which::which("bash")`), which
may or may not exist and may not behave as intended.
- In practice, this leads to commands being executed under a non-default
or non-existent shell on Windows (for example, WSL bash), instead of the
expected Windows PowerShell or `cmd.exe`.
The core of the issue is that **"model did not specify `shell`" is
currently interpreted as "the model explicitly requested `/bin/bash`"**,
which is both Unix-specific and wrong on Windows.
## Proposed Solution
Instead of hard-coding `"/bin/bash"` into `ExecCommandArgs`, we should
distinguish between:
1. **The model explicitly specifying a shell**, e.g.:
```json
{
"cmd": "echo hello",
"shell": "pwsh"
}
```
In this case, we *do* want to respect the model’s choice and use
`get_shell_by_model_provided_path`.
2. **The model omitting the `shell` field entirely**, e.g.:
```json
{
"cmd": "echo hello"
}
```
In this case, we should *not* assume `/bin/bash`. Instead, we should use
`default_user_shell()` and let the platform decide.
To express this distinction, we can:
1. Change `shell` to be optional in `ExecCommandArgs`:
```rust
#[derive(Debug, Deserialize)]
struct ExecCommandArgs {
cmd: String,
#[serde(default)]
workdir: Option<String>,
#[serde(default)]
shell: Option<String>,
#[serde(default = "default_login")]
login: bool,
#[serde(default = "default_exec_yield_time_ms")]
yield_time_ms: u64,
#[serde(default)]
max_output_tokens: Option<usize>,
#[serde(default)]
with_escalated_permissions: Option<bool>,
#[serde(default)]
justification: Option<String>,
}
```
Here, the absence of `shell` in the JSON is represented as `shell:
None`, rather than a hard-coded string value.
I think this might help with https://github.com/openai/codex/pull/7033
because `create_approval_requirement_for_command()` will soon need
access to `Session.state`, which is a `tokio::sync::Mutex` that needs to
be accessed via `async`.
This updates `ExecParams` so that instead of taking `timeout_ms:
Option<u64>`, it now takes a more general cancellation mechanism,
`ExecExpiration`, which is an enum that includes a
`Cancellation(tokio_util::sync::CancellationToken)` variant.
If the cancellation token is fired, then `process_exec_tool_call()`
returns in the same way as if a timeout was exceeded.
This is necessary so that in #6973, we can manage the timeout logic
external to the `process_exec_tool_call()` because we want to "suspend"
the timeout when an elicitation from a human user is pending.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
* #7005
* #6973
* __->__ #6972
This PR adds the API V2 version of the apply_patch approval flow, which
centers around `ThreadItem::FileChange`.
This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only)
and related events (`item/started`, `item/completed` for
`ThreadItem::FileChange`, which are emitted in both V1 and V2) through
the app-server
protocol. The new approval RPC is only sent when the user initiates a
turn with the new `turn/start` API so we don't break backwards
compatibility with VSCE.
Similar to https://github.com/openai/codex/pull/6758, the approach I
took was to make as few changes to the Codex core as possible,
leveraging existing `EventMsg` core events, and translating those in
app-server. I did have to add a few additional fields to
`EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those
were fairly lightweight.
However, the `EventMsg`s emitted by core are the following:
```
1) Auto-approved (no request for approval)
- EventMsg::PatchApplyBegin
- EventMsg::PatchApplyEnd
2) Approved by user
- EventMsg::ApplyPatchApprovalRequest
- EventMsg::PatchApplyBegin
- EventMsg::PatchApplyEnd
3) Declined by user
- EventMsg::ApplyPatchApprovalRequest
- EventMsg::PatchApplyBegin
- EventMsg::PatchApplyEnd
```
For a request triggering an approval, this would result in:
```
item/fileChange/requestApproval
item/started
item/completed
```
which is different from the `ThreadItem::CommandExecution` flow
introduced in https://github.com/openai/codex/pull/6758, which does the
below and is preferable:
```
item/started
item/commandExecution/requestApproval
item/completed
```
To fix this, we leverage `TurnSummaryStore` on codex_message_processor
to store a little bit of state, allowing us to fire `item/started` and
`item/fileChange/requestApproval` whenever we receive the underlying
`EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the
`EventMsg::PatchApplyBegin` later.
This is much less invasive than modifying the order of EventMsg within
core (I tried).
The resulting payloads:
```
{
"method": "item/started",
"params": {
"item": {
"changes": [
{
"diff": "Hello from Codex!\n",
"kind": "add",
"path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
}
],
"id": "call_Nxnwj7B3YXigfV6Mwh03d686",
"status": "inProgress",
"type": "fileChange"
}
}
}
```
```
{
"id": 0,
"method": "item/fileChange/requestApproval",
"params": {
"grantRoot": null,
"itemId": "call_Nxnwj7B3YXigfV6Mwh03d686",
"reason": null,
"threadId": "019a9e11-8295-7883-a283-779e06502c6f",
"turnId": "1"
}
}
```
```
{
"id": 0,
"result": {
"decision": "accept"
}
}
```
```
{
"method": "item/completed",
"params": {
"item": {
"changes": [
{
"diff": "Hello from Codex!\n",
"kind": "add",
"path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
}
],
"id": "call_Nxnwj7B3YXigfV6Mwh03d686",
"status": "completed",
"type": "fileChange"
}
}
}
```
This PR threads execpolicy2 into codex-core.
activated via feature flag: exec_policy (on by default)
reads and parses all .codexpolicy files in `codex_home/codex`
refactored tool runtime API to integrate execpolicy logic
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
Instead of returning structured out and then re-formatting it into
freeform, return the freeform output from shell_command tool.
Keep `shell` as the default tool for GPT-5.
- This PR is to make it on path for truncating by tokens. This path will
be initially used by unified exec and context manager (responsible for
MCP calls mainly).
- We are exposing new config `calls_output_max_tokens`
- Use `tokens` as the main budget unit but truncate based on the model
family by Introducing `TruncationPolicy`.
- Introduce `truncate_text` as a router for truncation based on the
mode.
In next PRs:
- remove truncate_with_line_bytes_budget
- Add the ability to the model to override the token budget.