codex

mirror of https://github.com/openai/codex.git synced 2026-06-01 19:02:59 +00:00

Author	SHA1	Message	Date
jif-oai	4cef89a122	chore: rename unified exec sessions (#8822 ) Renaming done by Codex	2026-01-07 16:12:47 +00:00
sayan-oai	54ded1a3c0	add web_search_cached flag (#8795 ) Add `web_search_cached` feature to config. Enables `web_search` tool with access only to cached/indexed results (see [docs](https://platform.openai.com/docs/guides/tools-web-search#live-internet-access)). This takes precedence over the existing `web_search_request`, which continues to enable `web_search` over live results as it did before. `web_search_cached` is disabled for review mode, as `web_search_request` is.	2026-01-06 14:53:59 -08:00
jif-oai	8858012fd1	chore: emit unified exec begin only when PTY exist (#8780 )	2026-01-06 13:12:54 +00:00
pakrym-oai	96fdbdd434	Add ExecPolicyManager (#8349 ) Move exec policy management into services to keep turn context immutable.	2025-12-22 09:59:32 -08:00
Dylan Hurd	33e1d0844a	feat(windows) start powershell in utf-8 mode (#7902 ) ## Summary Adds a FeatureFlag to enforce UTF8 encoding in powershell, particularly Windows Powershell v5. This should help address issues like #7290. Notably, this PR does not include the ability to parse `apply_patch` invocations within UTF8 shell commands (calls to the freeform tool should not be impacted). I am leaving this out of scope for now. We should address before this feature becomes Stable, but those cases are not the default behavior at this time so we're okay for experimentation phase. We should continue cleaning up the `apply_patch::invocation` logic and then can handle it more cleanly. ## Testing - [x] Adds additional testing	2025-12-22 09:36:44 -08:00
Ahmed Ibrahim	f0dc6fd3c7	Rename OpenAI models to models manager (#8346 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2025-12-19 16:20:05 -08:00
Anton Panasenko	3429de21b3	feat: introduce ExternalSandbox policy (#8290 ) ## Description Introduced `ExternalSandbox` policy to cover use case when sandbox defined by outside environment, effectively it translates to `SandboxMode#DangerFullAccess` for file system (since sandbox configured on container level) and configurable `network_access` (either Restricted or Enabled by outside environment). as example you can configure `ExternalSandbox` policy as part of `sendUserTurn` v1 app_server API: ``` { "conversationId": <id>, "cwd": <cwd>, "approvalPolicy": "never", "sandboxPolicy": { "type": ""external-sandbox", "network_access": "enabled"/"restricted" }, "model": <model>, "effort": <effort>, .... } ```	2025-12-18 17:02:03 -08:00
Michael Bolin	3d4ced3ff5	chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276 ) https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and this PR updates all call non-test call sites to use it instead of `Config::load_from_base_config_with_overrides()`. This is important because `load_from_base_config_with_overrides()` uses an empty `ConfigRequirements`, which is a reasonable default for testing so the tests are not influenced by the settings on the host. This method is now guarded by `#[cfg(test)]` so it cannot be used by business logic. Because `ConfigBuilder::build()` is `async`, many of the test methods had to be migrated to be `async`, as well. On the bright side, this made it possible to eliminate a bunch of `block_on_future()` stuff.	2025-12-18 16:12:52 -08:00
jif-oai	45c164a982	nit: doc (#8186 )	2025-12-17 15:29:29 +00:00
jif-oai	2e7e4f6ea6	nit: drop dead branch with `unified_exec` tool (#8182 )	2025-12-17 13:55:13 +00:00
jif-oai	813bdb9010	feat: fallback unified_exec to shell_command (#8075 )	2025-12-17 10:29:45 +00:00
jif-oai	d7482510b1	nit: trace span for regular task (#8053 ) Logs are too spammy --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>	2025-12-16 16:53:15 +00:00
Dylan Hurd	b9d1a087ee	chore(shell_command) fix freeform timeout output (#7791 ) ## Summary Adding an additional integration test for timeout_ms ## Testing - [x] these are tests	2025-12-15 19:26:39 -08:00
Ahmed Ibrahim	d802b18716	fix parallel tool calls (#7956 )	2025-12-16 01:28:27 +00:00
Anton Panasenko	ad7b9d63c3	[codex] add otel tracing (#7844 )	2025-12-12 17:07:17 -08:00
Michael Bolin	9009490357	fix: use PowerShell to parse PowerShell (#7607 ) Previous to this PR, we used a hand-rolled PowerShell parser in `windows_safe_commands.rs` to take a `&str` of PowerShell script see if it is equivalent to a list of `execvp(3)` invocations, and if so, we then test each using `is_safe_powershell_command()` to determine if the overall command is safe: `6e6338aa87/codex-rs/core/src/command_safety/windows_safe_commands.rs (L89-L98)` Unfortunately, our PowerShell parser did not recognize `@(...)` as a special construct, so it was treated as an ordinary token. This meant that the following would erroneously be considered "safe:" ```powershell ls @(calc.exe) ``` The fix introduced in this PR is to do something comparable what we do for Bash/Zsh, which is to use a "proper" parser to derive the list of `execvp(3)` calls. For Bash/Zsh, we rely on https://crates.io/crates/tree-sitter-bash, but there does not appear to be a crate of comparable quality for parsing PowerShell statically (https://github.com/airbus-cert/tree-sitter-powershell/ is the best thing I found). Instead, in this PR, we use a PowerShell script to parse the input PowerShell program to produce the AST.	2025-12-12 13:06:49 -08:00
Ahmed Ibrahim	b7fa7ca8e9	Update Model Info (#7853 )	2025-12-11 14:06:07 -08:00
Ahmed Ibrahim	b9fb3b81e5	Chore: limit find family visability (#7891 ) a little bit more code quality of life	2025-12-11 13:30:56 -08:00
jif-oai	29381ba5c2	feat: add shell snapshot for shell command (#7786 )	2025-12-11 13:46:43 +00:00
Eric Traut	c4af707e09	Removed experimental "command risk assessment" feature (#7799 ) This experimental feature received lukewarm reception during internal testing. Removing from the code base.	2025-12-10 09:48:11 -08:00
zhao-oai	e0fb3ca1db	refactoring with_escalated_permissions to use SandboxPermissions instead (#7750 ) helpful in the future if we want more granularity for requesting escalated permissions: e.g when running in readonly sandbox, model can request to escalate to a sandbox that allows writes	2025-12-10 17:18:48 +00:00
jif-oai	0ad54982ae	chore: rework unified exec events (#7775 )	2025-12-10 10:30:38 +00:00
jif-oai	7836aeddae	feat: shell snapshotting (#7641 )	2025-12-09 18:36:58 +00:00
pakrym-oai	ac5fa6baf8	Do not emit start/end events for write stdin (#7561 )	2025-12-08 15:23:02 -08:00
jif-oai	da983c1761	feat: add is-mutating detection for shell command handler (#7729 )	2025-12-08 18:42:09 +00:00
zhao-oai	c2bdee0946	proposing execpolicy amendment when prompting due to sandbox denial (#7653 ) Currently, we only show the “don’t ask again for commands that start with…” option when a command is immediately flagged as needing approval. However, there is another case where we ask for approval: When a command is initially auto-approved to run within sandbox, but it fails to run inside sandbox, we would like to attempt to retry running outside of sandbox. This will require a prompt to the user. This PR addresses this latter case	2025-12-08 17:55:20 +00:00
Eric Traut	acb8ed493f	Fixed regression for chat endpoint; missing tools name caused litellm proxy to crash (#7724 ) This PR addresses https://github.com/openai/codex/issues/7051	2025-12-08 00:49:51 -08:00
Dylan Hurd	a8cbbdbc6e	feat(core) Add login to shell_command tool (#6846 ) ## Summary Adds the `login` parameter to the `shell_command` tool - optional, defaults to true. ## Testing - [x] Tested locally	2025-12-05 11:03:25 -08:00
Ahmed Ibrahim	7b359c9c8e	Call models endpoint in models manager (#7616 ) - Introduce `with_remote_overrides` and update `refresh_available_models` - Put `auth_manager` instead of `auth_mode` on `models_manager` - Remove `ShellType` and `ReasoningLevel` to use already existing structs	2025-12-04 18:28:03 -08:00
Ahmed Ibrahim	6e6338aa87	Inline response recording and remove process_items indirection (#7310 ) - Inline response recording during streaming: `run_turn` now records items as they arrive instead of building a `ProcessedResponseItem` list and post‑processing via `process_items`. - Simplify turn handling: `handle_output_item_done` returns the follow‑up signal + optional tool future; `needs_follow_up` is set only there, and in‑flight tool futures are drained once at the end (errors logged, no extra state writes). - Flattened stream loop: removed `process_items` indirection and the extra output queue - - Tests: relaxed `tool_parallelism::tool_results_grouped` to allow any completion order while still requiring matching call/output IDs.	2025-12-04 12:17:54 -08:00
jif-oai	36edb412b1	fix: release session ID when not used (#7592 )	2025-12-04 17:42:16 +00:00
zhao-oai	3d35cb4619	Refactor execpolicy fallback evaluation (#7544 ) ## Refactor of the `execpolicy` crate To illustrate why we need this refactor, consider an agent attempting to run `apple \| rm -rf ./`. Suppose `apple` is allowed by `execpolicy`. Before this PR, `execpolicy` would consider `apple` and `pear` and only render one rule match: `Allow`. We would skip any heuristics checks on `rm -rf ./` and immediately approve `apple \| rm -rf ./` to run. To fix this, we now thread a `fallback` evaluation function into `execpolicy` that runs when no `execpolicy` rules match a given command. In our example, we would run `fallback` on `rm -rf ./` and prevent `apple \| rm -rf ./` from being run without approval.	2025-12-03 23:39:48 -08:00
zhao-oai	e925a380dc	whitelist command prefix integration in core and tui (#7033 ) this PR enables TUI to approve commands and add their prefixes to an allowlist: <img width="708" height="605" alt="Screenshot 2025-11-21 at 4 18 07 PM" src="https://github.com/user-attachments/assets/56a19893-4553-4770-a881-becf79eeda32" /> note: we only show the option to whitelist the command when 1) command is not multi-part (e.g `git add -A && git commit -m 'hello world'`) 2) command is not already matched by an existing rule	2025-12-03 23:17:02 -08:00
Ahmed Ibrahim	cee37a32b2	Migrate model family to models manager (#7565 ) This PR moves `ModelsFamily` to `openai_models`. It also propagates `ModelsManager` to session services and use it to drive model family. We also make `derive_default_model_family` private because it's a step towards what we want: one place that gives model configuration. This is a second step at having one source of truth for models information and config: `ModelsManager`. Next steps would be to remove `ModelsFamily` from config. That's massive because it's being used in 41 occasions mostly pre launching `codex`. Also, we need to make `find_family_for_model` private. It's also big because it's being used in 21 occasions ~ all tests.	2025-12-03 18:49:47 -08:00
jif-oai	42ae738f67	feat: model warning in case of apply patch (#7494 )	2025-12-03 09:07:31 +00:00
Robby He	f3989f6092	fix(unified_exec): use platform default shell when unified_exec shell… (#7486 ) # Unified Exec Shell Selection on Windows ## Problem reference issue #7466 The `unified_exec` handler currently deserializes model-provided tool calls into the `ExecCommandArgs` struct: ```rust #[derive(Debug, Deserialize)] struct ExecCommandArgs { cmd: String, #[serde(default)] workdir: Option<String>, #[serde(default = "default_shell")] shell: String, #[serde(default = "default_login")] login: bool, #[serde(default = "default_exec_yield_time_ms")] yield_time_ms: u64, #[serde(default)] max_output_tokens: Option<usize>, #[serde(default)] with_escalated_permissions: Option<bool>, #[serde(default)] justification: Option<String>, } ``` The `shell` field uses a hard-coded default: ```rust fn default_shell() -> String { "/bin/bash".to_string() } ``` When the model returns a tool call JSON that only contains `cmd` (which is the common case), Serde fills in `shell` with this default value. Later, `get_command` uses that value as if it were a model-provided shell path: ```rust fn get_command(args: &ExecCommandArgs) -> Vec<String> { let shell = get_shell_by_model_provided_path(&PathBuf::from(args.shell.clone())); shell.derive_exec_args(&args.cmd, args.login) } ``` On Unix, this usually resolves to `/bin/bash` and works as expected. However, on Windows this behavior is problematic: - The hard-coded `"/bin/bash"` is not a valid Windows path. - `get_shell_by_model_provided_path` treats this as a model-specified shell, and tries to resolve it (e.g. via `which::which("bash")`), which may or may not exist and may not behave as intended. - In practice, this leads to commands being executed under a non-default or non-existent shell on Windows (for example, WSL bash), instead of the expected Windows PowerShell or `cmd.exe`. The core of the issue is that "model did not specify `shell`" is currently interpreted as "the model explicitly requested `/bin/bash`", which is both Unix-specific and wrong on Windows. ## Proposed Solution Instead of hard-coding `"/bin/bash"` into `ExecCommandArgs`, we should distinguish between: 1. The model explicitly specifying a shell, e.g.: ```json { "cmd": "echo hello", "shell": "pwsh" } ``` In this case, we do want to respect the model’s choice and use `get_shell_by_model_provided_path`. 2. The model omitting the `shell` field entirely, e.g.: ```json { "cmd": "echo hello" } ``` In this case, we should not assume `/bin/bash`. Instead, we should use `default_user_shell()` and let the platform decide. To express this distinction, we can: 1. Change `shell` to be optional in `ExecCommandArgs`: ```rust #[derive(Debug, Deserialize)] struct ExecCommandArgs { cmd: String, #[serde(default)] workdir: Option<String>, #[serde(default)] shell: Option<String>, #[serde(default = "default_login")] login: bool, #[serde(default = "default_exec_yield_time_ms")] yield_time_ms: u64, #[serde(default)] max_output_tokens: Option<usize>, #[serde(default)] with_escalated_permissions: Option<bool>, #[serde(default)] justification: Option<String>, } ``` Here, the absence of `shell` in the JSON is represented as `shell: None`, rather than a hard-coded string value.	2025-12-02 21:49:25 -08:00
Michael Bolin	ec93b6daf3	chore: make create_approval_requirement_for_command an async fn (#7501 ) I think this might help with https://github.com/openai/codex/pull/7033 because `create_approval_requirement_for_command()` will soon need access to `Session.state`, which is a `tokio::sync::Mutex` that needs to be accessed via `async`.	2025-12-02 15:01:15 -08:00
jif-oai	72b95db12f	feat: intercept apply_patch for unified_exec (#7446 )	2025-12-02 17:54:02 +00:00
jif-oai	28ff364c3a	feat: update process ID for event handling (#7261 )	2025-11-25 14:21:05 -08:00
zhao-oai	87b211709e	bypass sandbox for policy approved commands (#7110 ) allowing cmds greenlit by execpolicy to bypass sandbox + minor refactor for a world where we have execpolicy rules with specific sandbox requirements	2025-11-21 18:03:23 -05:00
Ahmed Ibrahim	d5f661c91d	enable unified exec for experiments (#7118 )	2025-11-21 13:10:01 -08:00
Michael Bolin	f56d1dc8fc	feat: update process_exec_tool_call() to take a cancellation token (#6972 ) This updates `ExecParams` so that instead of taking `timeout_ms: Option<u64>`, it now takes a more general cancellation mechanism, `ExecExpiration`, which is an enum that includes a `Cancellation(tokio_util::sync::CancellationToken)` variant. If the cancellation token is fired, then `process_exec_tool_call()` returns in the same way as if a timeout was exceeded. This is necessary so that in #6973, we can manage the timeout logic external to the `process_exec_tool_call()` because we want to "suspend" the timeout when an elicitation from a human user is pending. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972). * #7005 * #6973 * __->__ #6972	2025-11-20 16:29:57 -08:00
pakrym-oai	856f97f449	Delete shell_command feature (#7024 )	2025-11-20 14:14:56 -08:00
pakrym-oai	30ca89424c	Always fallback to real shell (#6953 ) Either cmd.exe or `/bin/sh`.	2025-11-20 10:58:46 -08:00
Owen Lin	d6c30ed25e	[app-server] feat: v2 apply_patch approval flow (#6760 ) This PR adds the API V2 version of the apply_patch approval flow, which centers around `ThreadItem::FileChange`. This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only) and related events (`item/started`, `item/completed` for `ThreadItem::FileChange`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. Similar to https://github.com/openai/codex/pull/6758, the approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add a few additional fields to `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those were fairly lightweight. However, the `EventMsg`s emitted by core are the following: ``` 1) Auto-approved (no request for approval)  - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 2) Approved by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 3) Declined by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd ``` For a request triggering an approval, this would result in: ``` item/fileChange/requestApproval item/started item/completed ``` which is different from the `ThreadItem::CommandExecution` flow introduced in https://github.com/openai/codex/pull/6758, which does the below and is preferable: ``` item/started item/commandExecution/requestApproval item/completed ``` To fix this, we leverage `TurnSummaryStore` on codex_message_processor to store a little bit of state, allowing us to fire `item/started` and `item/fileChange/requestApproval` whenever we receive the underlying `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the `EventMsg::PatchApplyBegin` later. This is much less invasive than modifying the order of EventMsg within core (I tried). The resulting payloads: ``` { "method": "item/started", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "inProgress", "type": "fileChange" } } } ``` ``` { "id": 0, "method": "item/fileChange/requestApproval", "params": { "grantRoot": null, "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686", "reason": null, "threadId": "019a9e11-8295-7883-a283-779e06502c6f", "turnId": "1" } } ``` ``` { "id": 0, "result": { "decision": "accept" } } ``` ``` { "method": "item/completed", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "completed", "type": "fileChange" } } } ```	2025-11-19 20:13:31 -08:00
zhao-oai	65c13f1ae7	execpolicy2 core integration (#6641 ) This PR threads execpolicy2 into codex-core. activated via feature flag: exec_policy (on by default) reads and parses all .codexpolicy files in `codex_home/codex` refactored tool runtime API to integrate execpolicy logic --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-11-19 16:50:43 -08:00
Ahmed Ibrahim	efebc62fb7	Move shell to use `truncate_text` (#6842 ) Move shell to use the configurable `truncate_text` --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-19 01:56:08 -08:00
pakrym-oai	ee0484a98c	shell_command returns freeform output (#6860 ) Instead of returning structured out and then re-formatting it into freeform, return the freeform output from shell_command tool. Keep `shell` as the default tool for GPT-5.	2025-11-18 23:38:43 -08:00
Dylan Hurd	29ca89c414	chore(config) enable shell_command (#6843 ) ## Summary Enables shell_command as default for `gpt-5` and `codex-` models. ## Testing - [x] Updated unit tests	2025-11-18 12:46:02 -08:00
Ahmed Ibrahim	3de8790714	Add the utility to truncate by tokens (#6746 ) - This PR is to make it on path for truncating by tokens. This path will be initially used by unified exec and context manager (responsible for MCP calls mainly). - We are exposing new config `calls_output_max_tokens` - Use `tokens` as the main budget unit but truncate based on the model family by Introducing `TruncationPolicy`. - Introduce `truncate_text` as a router for truncation based on the mode. In next PRs: - remove truncate_with_line_bytes_budget - Add the ability to the model to override the token budget.	2025-11-18 11:36:23 -08:00

... 11 12 13 14 15

711 Commits