codex

mirror of https://github.com/openai/codex.git synced 2026-06-01 19:02:59 +00:00

Author	SHA1	Message	Date
zhao-oai	e9e644a119	fixing localshell tool calls (#6823 ) - Local-shell tool responses were always tagged as `ExecCommandSource::UserShell` because handler would call `run_exec_like` with `is_user_shell_cmd` set to true. - Treat `ToolPayload::LocalShell` the same as other model generated shell tool calls by deleting `is_user_shell_cmd` from `run_exec_like` (since actual user shell commands follow a separate code path)	2025-11-18 17:28:26 +00:00
Dylan Hurd	28ebe1c97a	fix(windows) shell_command on windows, minor parsing (#6811 ) ## Summary Enables shell_command for windows users, and starts adding some basic command parsing here, to at least remove powershell prefixes. We'll follow this up with command parsing but I wanted to land this change separately with some basic UX. NOTE: This implementation parses bash and powershell on both platforms. In theory this is possible, since you can use git bash on windows or powershell on linux. In practice, this may not be worth the complexity of supporting, so I don't feel strongly about the current approach vs. platform-specific branching. ## Testing - [x] Added a bunch of tests - [x] Ran on both windows and os x	2025-11-17 22:23:53 -08:00
Owen Lin	cecbd5b021	[app-server] feat: add v2 command execution approval flow (#6758 ) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```	2025-11-18 00:23:54 +00:00
Jeremy Rose	ab2e7499f8	core: add a feature to disable the shell tool (#6481 ) `--disable shell_tool` disables the built-in shell tool. This is useful for MCP-only operation. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-11-17 22:56:19 +00:00
Dylan Hurd	daf77b8452	chore(core) Update shell instructions (#6679 ) ## Summary Consolidates `shell` and `shell_command` tool instructions. ## Testing - [x] Updated tests, tested locally	2025-11-17 13:05:15 -08:00
Jeremy Rose	03ffe4d595	core/tui: non-blocking MCP startup (#6334 ) This makes MCP startup not block TUI startup. Messages sent while MCPs are booting will be queued. https://github.com/user-attachments/assets/96e1d234-5d8f-4932-a935-a675d35c05e0 Fixes #6317 --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-17 11:26:11 -08:00
Dylan Hurd	497fb4a19c	fix(core) serialize shell_command (#6744 ) ## Summary Ensures we're serializing calls to `shell_command` ## Testing - [x] Added unit test	2025-11-16 23:16:51 -08:00
jif-oai	63c8c01f40	feat: better UI for unified_exec (#6515 ) <img width="376" height="132" alt="Screenshot 2025-11-12 at 17 36 22" src="https://github.com/user-attachments/assets/ce693f0d-5ca0-462e-b170-c20811dcc8d5" />	2025-11-14 16:31:12 +01:00
Ahmed Ibrahim	9890ceb939	Avoid double truncation (#6631 ) 1. Avoid double truncation by giving 10% above the tool default constant 2. Add tests that fails when const = 1	2025-11-13 16:59:31 -08:00
pakrym-oai	7b027e7536	Revert "Revert "Overhaul shell detection and centralize command generation for unified exec"" (#6607 ) Reverts openai/codex#6606	2025-11-13 16:45:17 -08:00
pakrym-oai	0792a7953d	Update default yield time (#6610 ) 10s for exec and 250ms for write_stdin	2025-11-13 10:24:41 -08:00
pakrym-oai	e6995174c1	Revert "Overhaul shell detection and centralize command generation for unified exec" (#6606 ) Reverts openai/codex#6577	2025-11-13 08:43:00 -08:00
pakrym-oai	d28e912214	Overhaul shell detection and centralize command generation for unified exec (#6577 ) This fixes command display for unified exec. All `cd`s and `ls`es are now parsed. <img width="452" height="237" alt="image" src="https://github.com/user-attachments/assets/ce92d81f-f74c-485a-9b34-1eaa29290ec6" /> Deletes a ton of tests that were doing nothing from shell.rs. --------- Co-authored-by: Pavel Krymets <pavel@krymets.com>	2025-11-13 08:28:09 -08:00
Ahmed Ibrahim	b1979b70a8	remove porcupine model slug (#6580 )	2025-11-13 04:43:31 +00:00
Ahmed Ibrahim	ad7eaa80f9	Change model picker to include gpt5.1 (#6569 ) - Change the presets - Change the tests that make sure we keep the list of tools updated - Filter out deprecated models	2025-11-12 19:44:53 -08:00
jif-oai	e00eb50db3	feat: only wait for mutating tools for ghost commit (#6534 )	2025-11-12 18:16:32 +00:00
Michael Bolin	29364f3a9b	feat: shell_command tool (#6510 ) This adds support for a new variant of the shell tool behind a flag. To test, run `codex` with `--enable shell_command_tool`, which will register the tool with Codex under the name `shell_command` that accepts the following shape: ```python { command: str workdir: str \| None, timeout_ms: int \| None, with_escalated_permissions: bool \| None, justification: str \| None, } ``` This is comparable to the existing tool registered under `shell`/`container.exec`. The primary difference is that it accepts `command` as a `str` instead of a `str[]`. The `shell_command` tool executes by running `execvp(["bash", "-lc", command])`, though the exact arguments to `execvp(3)` depend on the user's default shell. The hypothesis is that this will simplify things for the model. For example, on Windows, instead of generating: ```json {"command": ["pwsh.exe", "-NoLogo", "-Command", "ls -Name"]} ``` The model could simply generate: ```json {"command": "ls -Name"} ``` As part of this change, I extracted some logic out of `user_shell.rs` as `Shell::derive_exec_args()` so that it can be reused in `codex-rs/core/src/tools/handlers/shell.rs`. Note the original code generated exec arg lists like: ```javascript ["bash", "-lc", command] ["zsh", "-lc", command] ["pwsh.exe", "-NoProfile", "-Command", command] ``` Using `-l` for Bash and Zsh, but then specifying `-NoProfile` for PowerShell seemed inconsistent to me, so I changed this in the new implementation while also adding a `use_login_shell: bool` option to make this explicit. If we decide to add a `login: bool` to `ShellCommandToolCallParams` like we have for unified exec: `807e2c27f0/codex-rs/core/src/tools/handlers/unified_exec.rs (L33-L34)` Then this should make it straightforward to support.	2025-11-12 08:18:57 -08:00
pakrym-oai	807e2c27f0	Add unified exec escalation handling and tests (#6492 ) Similar implementation to the shell tool	2025-11-11 08:19:35 -08:00
jif-oai	ad279eacdc	nit: logs to trace (#6503 )	2025-11-11 13:37:06 +00:00
jif-oai	f01f2ec9ee	feat: add workdir to unified_exec (#6466 )	2025-11-10 19:53:36 +00:00
pakrym-oai	91b16b8682	Don't request approval for safe commands in unified exec (#6380 )	2025-11-07 16:36:04 -08:00
pakrym-oai	4c1a6f0ee0	Promote shell config tool to model family config (#6351 )	2025-11-07 10:11:11 -08:00
pakrym-oai	c368c6aeea	Remove shell tool when unified exec is enabled (#6345 ) Also drop streameable shell that's just an alias for unified exec.	2025-11-06 15:46:24 -08:00
pakrym-oai	b5349202e9	Freeform unified exec output formatting (#6233 )	2025-11-06 22:14:27 +00:00
Ahmed Ibrahim	1a89f70015	refactor Conversation history file into its own directory (#6229 ) This is just a refactor of `conversation_history` file by breaking it up into multiple smaller ones with helper. This refactor will help us move more functionality related to context management here. in a clean way.	2025-11-05 10:49:35 -08:00
Ahmed Ibrahim	6ee7fbcfff	feat: add the time after aborting (#5996 ) Tell the model how much time passed after the user aborted the call.	2025-11-03 11:44:06 -08:00
Eric Traut	d5853d9c47	Changes to sandbox command assessment feature based on initial experiment feedback (#6091 ) * Removed sandbox risk categories; feedback indicates that these are not that useful and "less is more" * Tweaked the assessment prompt to generate terser answers * Fixed bug in orchestrator that prevents this feature from being exposed in the extension	2025-11-01 14:52:23 -07:00
iceweasel-oai	87cce88f48	Windows Sandbox - Alpha version (#4905 ) - Added the new codex-windows-sandbox crate that builds both a library entry point (run_windows_sandbox_capture) and a CLI executable to launch commands inside a Windows restricted-token sandbox, including ACL management, capability SID provisioning, network lockdown, and output capture (windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54). - Introduced the experimental WindowsSandbox feature flag and wiring so Windows builds can opt into the sandbox: SandboxType::WindowsRestrictedToken, the in-process execution path, and platform sandbox selection now honor the flag (core/src/features.rs:47, core/src/config.rs:1224, core/src/safety.rs:19, core/src/sandboxing/mod.rs:69, core/src/exec.rs:79, core/src/exec.rs:172). - Updated workspace metadata to include the new crate and its Windows-specific dependencies so the core crate can link against it (codex-rs/ Cargo.toml:91, core/Cargo.toml:86). - Added a PowerShell bootstrap script that installs the Windows toolchain, required CLI utilities, and builds the workspace to ease development on the platform (scripts/setup-windows.ps1:1). - Landed a Python smoke-test suite that exercises read-only/workspace-write policies, ACL behavior, and network denial for the Windows sandbox binary (windows-sandbox-rs/sandbox_smoketests.py:1).	2025-10-30 15:51:57 -07:00
jif-oai	3183935bd7	feat: add output even in sandbox denied (#5908 )	2025-10-29 18:21:18 +00:00
Abhishek Bhardwaj	89591e4246	feature: Add "!cmd" user shell execution (#2471 ) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). Summary of changes - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. End-to-end flow TUI handling 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. Core submission loop 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. Task execution 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. Shell tool runtime 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. Completion 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. Conversation context - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. Demo video https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996	2025-10-29 00:31:20 -07:00
Gabriel Peal	b0bdc04c30	[MCP] Render MCP tool call result images to the model (#5600 ) It's pretty amazing we have gotten here without the ability for the model to see image content from MCP tool calls. This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get adequete credit here but I also want to get this fix in ASAP so I gave him a week to update it and haven't gotten a response so I'm going to take it across the finish line. This test highlights how absured the current situation is. I asked the model to read this image using the Chrome MCP <img width="2378" height="674" alt="image" src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0" /> After this change, it correctly outputs: > Captured the page: image dhows a dark terminal-style UI labeled `OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and working directory `/codex/codex-rs` (and more) Before this change, it said: > Took the full-page screenshot you asked for. It shows a long, horizontally repeating pattern of stylized people in orange, light-blue, and mustard clothing, holding hands in alternating poses against a white background. No text or other graphics-just rows of flat illustration stretching off to the right. Without this change, the Figma, Playwright, Chrome, and other visual MCP servers are pretty much entirely useless. I tested this change with the openai respones api as well as a third party completions api	2025-10-27 17:55:57 -04:00
Ahmed Ibrahim	7226365397	Centralize truncation in conversation history (#5652 ) move the truncation logic to conversation history to use on any tool output. This will help us in avoiding edge cases while truncating the tool calls and mcp calls.	2025-10-27 14:05:35 -07:00
jif-oai	e92c4f6561	feat: async ghost commit (#5618 )	2025-10-27 10:09:10 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
jif-oai	a6b9471548	feat: end events on unified exec (#5551 )	2025-10-23 18:51:34 +01:00
jif-oai	6745b12427	chore: testing on apply_path (#5557 )	2025-10-23 17:00:48 +01:00
Ahmed Ibrahim	f59978ed3d	Handle cancelling/aborting while processing a turn (#5543 ) Currently we collect all all turn items in a vector, then we add it to the history on success. This result in losing those items on errors including aborting `ctrl+c`. This PR: - Adds the ability for the tool call to handle cancellation - bubble the turn items up to where we are recording this info Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of error edge cases. The right thing to do is recording to the history on the spot as `items`/`tool calls output` come. However, this isn't possible because of having different `task_kind` that has different `conversation_histories`. The `try_run_turn` has no idea what thread are we using. We cannot also pass an `arc` to the `conversation_histories` because it's a private element of `state`. That's said, `abort` is the most common case and we should cover it until we remove `task kind`	2025-10-23 08:47:10 -07:00
jif-oai	892eaff46d	fix: approval issue (#5525 )	2025-10-23 11:13:53 +01:00
jif-oai	8e291a1706	chore: clean `handle_container_exec_with_params` (#5516 ) Drop `handle_container_exec_with_params` to have simpler and more straight forward execution path	2025-10-23 09:24:01 +01:00
jif-oai	bac7acaa7c	chore: clean spec tests (#5517 )	2025-10-22 18:30:33 +01:00
jif-oai	00b1e130b3	chore: align unified_exec (#5442 ) Align `unified_exec` with b implementation	2025-10-22 11:50:18 +01:00
jif-oai	da82153a8d	fix: fix UI issue when 0 omitted lines (#5451 )	2025-10-21 16:45:05 +00:00
jif-oai	4bd68e4d9e	feat: emit events for unified_exec (#5448 )	2025-10-21 17:32:39 +01:00
pakrym-oai	1b10a3a1b2	Enable plan tool by default (#5384 ) ## Summary - make the plan tool available by default by removing the feature flag and always registering the handler - drop plan-tool CLI and API toggles across the exec, TUI, MCP server, and app server code paths - update tests and configs to reflect the always-on plan tool and guard workspace restriction tests against env leakage ## Testing Manually tested the extension. ------ https://chatgpt.com/codex/tasks/task_i_68f67a3ff2d083209562a773f814c1f9	2025-10-21 16:25:05 +00:00
pakrym-oai	789e65b9d2	Pass TurnContext around instead of sub_id (#5421 ) Today `sub_id` is an ID of a single incoming Codex Op submition. We then associate all events triggered by this operation using the same `sub_id`. At the same time we are also creating a TurnContext per submission and we'd like to start associating some events (item added/item completed) with an entire turn instead of just the operation that started it. Using turn context when sending events give us flexibility to change notification scheme.	2025-10-21 08:04:16 -07:00
pakrym-oai	9c903c4716	Add ItemStarted/ItemCompleted events for UserInputItem (#5306 ) Adds a new ItemStarted event and delivers UserMessage as the first item type (more to come). Renames `InputItem` to `UserInput` considering we're using the `Item` suffix for actual items.	2025-10-20 13:34:44 -07:00
jif-oai	5e4f3bbb0b	chore: rework tools execution workflow (#5278 ) Re-work the tool execution flow. Read `orchestrator.rs` to understand the structure	2025-10-20 20:57:37 +01:00
jif-oai	6915ba2100	feat: better UX during refusal (#5260 ) <img width="568" height="169" alt="Screenshot 2025-10-16 at 18 28 05" src="https://github.com/user-attachments/assets/f42e8d6d-b7de-4948-b291-a5fbb50b1312" />	2025-10-17 11:06:55 +02:00
Gabriel Peal	40fba1bb4c	[MCP] Add support for resources (#5239 ) This PR adds support for [MCP resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources) by adding three new tools for the model: 1. `list_resources` 2. `list_resource_templates` 3. `read_resource` These 3 tools correspond to the [three primary MCP resource protocol messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages). Example of listing and reading a GitHub resource tempalte <img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10" src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585" /> `/mcp` with Figma configured <img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35" src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee" /> Fixes #4956	2025-10-17 01:05:15 -04:00
jif-oai	f7b4e29609	feat: feature flag (#4948 ) Add proper feature flag instead of having custom flags for everything. This is just for experimental/wip part of the code It can be used through CLI: ```bash codex --enable unified_exec --disable view_image_tool ``` Or in the `config.toml` ```toml # Global toggles applied to every profile unless overridden. [features] apply_patch_freeform = true view_image_tool = false ``` Follow-up: In a following PR, the goal is to have a default have `bundles` of features that we can associate to a model	2025-10-14 17:50:00 +00:00

... 11 12 13 14 15

711 Commits